Serving GridFS files with pyramid - python

I am wondering what is the best and possibly easiest way to serve files from GridFS using Pyramid. I use nginx as a proxy server (for ssl) and waitress as my application server.
The file types I need to be able to serve are the following: mp3, pdf, jpg, png
The files should be accessible through the following url "/files/{userid}/{filename}"
Right now the files are opened by the right application on client-side because I explicitly set the content-type in my code like so:
if filename[-3:] == "pdf":
response = Response(content_type='application/pdf')
elif filename[-3:] in ["jpg", "png"]:
response = Response(content_type='image/*')
elif filename[-3:] in ["mp3"]:
response = Response(content_type='audio/mp3')
else:
response = Response(content_type="application/*")
response.app_iter = file #file is a GridFS file object
return response
The only thing is that I can't stream the mp3s properly. I use audio.js to play them. They open up and play but no track length is shown and I can't seek them. I know it has something to do with the "accept-ranges" property but I can't seem to set it right. Does it have to do with nginx or waitress? Or am I just not setting the header correctly?
I would like to use something as easy as return FileResponse(file) like specified here but my file does not come from the filesystem directly... Is there a plug and play way to make this work?
Any advice would be really appreciated!
Thank you very much for your help!

I found a solution on this blog.
The idea is to use a patched DataApp from paste.fileapp. All the details are in the post, and now my app behaves just like I want!

I just solved the problem without the paste dependency in Pyramid 1.4, Python 3.
It seems that the attribute "conditional_response=True" and the "content_length" is important:
f = request.db.fs.files.find_one( { 'filename':filename, 'metadata.bucket': bucket } )
fs = gridfs.GridFS( request.db )
with fs.get( f.get( '_id') ) as gridout:
response = Response(content_type=gridout.content_type,body_file=gridout,conditional_response=True)
response.content_length = f.get('length')
return response

Another way (using Pyramid 1.5.7 on Python 2.7):
fs = GridFS(request.db, 'MyFileCollection')
grid_out = fs.get(file_id)
response = request.response
response.app_iter = FileIter(grid_out)
response.content_disposition = 'attachment; filename="%s"' % grid_out.name
return response

Related

How can I count the number of downloads of a static files on my django website?

I am beginner with django.
On my website I would like to create a library which allows downloads of executable files which I created myself. I would like to count how many times each file has been downloaded.
I thought to use a middleware, knowing that I am able to make a middleware which counts and displays the number of times a page has been viewed :
def stats_middleware (get_response):
def middleware (request):
try :
p = Stat.objects.get(url = request.path)
p.views_number = F('views_number')+1
p.save
except Stat.DoesNotExist :
p = Stat.objects.create(url= request.path)
response = get_response(request)
response.content += bytes(
"cette page a été vue {} fois.".format(p.views_number),
"utf8"
)
return response
return middleware
I thought that if I managed to open the download in a new page, I could count the number of times it appears and thus the number of downloads of the file, but I did not manage to open the download in another tab.
How can I do this?
You do not need to use middleware. Serving files through django is bad idea and is very ineffective. To control access to files and so on the servers creates special mechanism. X-Sendfile in Apache and X-Accel-Redirect in Ngnix. You need only create special response header in your View. In same View you can count downloads.
You car read more here: Django - Understanding X-Sendfile
And try to use this package: https://github.com/johnsensible/django-sendfile
Sample code:
from sendfile import sendfile
def download(request):
# here increment counter of download
return sendfile(request, file_name)

txt file downloaded is being viewed as html and that is excluding the empty lines

I've a Python/Flask app that is working ok locally. I have deployed it to the cloud (pythonanywhere) and it is all working on there as well except for a file that is being downloaded to the user that is coming as html, so the empty lines of the file are being excluded. That file is txt. When the user click on that, it opens on notepad. If opening that file in notepad++ the empty lines are there in the way it should be.
Following the Flask code to send that file:
response = make_response(result)
response.headers["Content-Disposition"] = "attachment; filename=file_to_user.txt"
If I use "inline instead of attachment", the empty lines are showed OK directly on the browser.
I've tried to add "Content type text/plain" before "Content-Disposition", but I believe that it is the default, so, no effect.
Anyone knows how could the user see that as txt file, instead of html when opening directly using notepad for example?
If you're just trying to send an existing file on the server, use send_from_directory.
If you're trying to make a response (for example, if you're generating data in memory, make_response defaults to text/html (it's just a shortcut which isn't applicable in your case). Create a response even more directly in order to override that using app.response_class.
This is a small example demonstrating both techniques.
from flask import Flask, send_from_directory
app = Flask(__name__)
#app.route('/file')
def download_file():
# change app.root_path to whatever the directory actually is
# this just serves this python file (named example.py) as plain text
return send_from_directory(
app.root_path, 'example.py',
as_attachment=True, mimetype='text/plain'
)
#app.route('/mem')
def download_mem():
# instantiate the response class directly
# pass the mimetype
r = app.response_class('test data\n\ntest data', mimetype='text/plain')
# add the attachment header
r.headers.set('Content-Disposition', 'attachment', filename='test_data.txt')
return r
app.run('localhost', debug=True)

How do you get Google App Engine to gunzip during download?

I am trying to get Google App Engine to gunzip my .gz blob file (single file compressed) automatically by setting the response headers as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.response.headers['Content-Encoding'] = str('gzip')
# self.response.headers['Content-type'] = str('application/x-gzip')
self.response.headers['Content-type'] = str(blob_info.content_type)
self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'attachment; filename=%s' % (blob_info.filename)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(blob_info)
When this runs, the file is downloaded without the .gz extension. However, the downloaded file is still gzipped. The file size of the downloaded data match the .gz file size on the server. Also, I can confirm this by manually gunzipping the downloaded file. I am trying to avoid the manual gunzip step.
I am trying to get the blob file to automatically gunzip during the download. What am I doing wrong?
By the way, the gzip file contains only a single file. On my self-hosted (non Google) server, I could accomplish the automatic gunzip by setting same response headers; albeit, my code there is written in PHP.
UPDATE:
I rewrote the handler to serve data from the bucket. However, this generates HTML 500 error. The file is partially downloaded before the failure. The rewrite is as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
file = '/gs/mydatabucket/%s' % blob_info.filename
print file
self.response.headers['Content-Encoding'] = str('gzip')
self.response.headers['Content-Type'] = str('application/x-gzip')
# self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'filename=%s' % (file)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(file)
This downloads 540,672 bytes of the 6,094,848 bytes file to the client before the server terminate and issued a 500 error. When I issue 'file' on the partially downloaded file from the command line, Mac OS seems to correctly identify the file format as 'SQLite 3.x database' file. Any idea of why the 500 error on the server? How can I fix the problem?
You should first check to see if your requesting client supports gzipped content. If it does support gzip content encoding, then you may pass the gzipped blob as is with the proper content-encoding and content-type headers, otherwise you need to decompress the blob for the client. You should also verify that your blob's content_type isn't gzip (this depends on how you created your blob to begin with!)
You may also want to look at Google Cloud Storage as this automatically handles gzip transportation so long as you properly compress the data before storing it with the proper content-encoding and content-type metadata.
See this SO question: Google cloud storage console Content-Encoding to gzip
Or the GCS Docs: https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata#content-encoding
You may use GCS as easily (if not more easily) as you use the blobstore in AppEngine and it seems to be the preferred storage layer to use going forward. I say this because the File API has been deprecated which made blobstore interaction easier and great efforts and advancements have been made to the GCS libraries making the API similar to the base python file interaction API
UPDATE:
Since the objects are stored in GCS, you can use 302 redirects to point users to files rather than relying on the Blobstore API. This eliminates any unknown behavior of the Blobstore API and GAE delivering your stored objects with the content-type and content-encoding you intended to use. For objects with a public-read ACL, you may simply direct them to either storage.googleapis.com/<bucket>/<object> or <bucket>.storage.googleapis.com/<object>. Alternatively, if you'd like to have application logic dictate access, you should keep the ACL to the objects private and can use GCS Signed URLs to create short lived URLs to use when doing a 302 redirect.
Its worth noting that if you want users to be able to upload objects via GAE, you'd still use the Blobstore API to handle storing the file in GCS, but you'd have to modify the object after it was uploaded to ensure proper gzip compressing and content-encoding meta data is used.
class legacy_download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
filename = str(urllib.unquote(resource))
url = 'https://storage.googleapis.com/mybucket/' + filename
self.redirect(url)
GAE already serves everything using gzip if the client supports it.
So I think what's happening after your update is that the browser expects there to be more of the file, but GAE thinks it's already at the end of the file since it's already gzipped. That's why you get the 500.
(if that makes sense)
Anyway, since GAE already handles compression for you, the easiest way is probably to put non compressed files in GCS and let the Google infrastructure handle the compression automatically for you when you serve them.

Python Flask, how to set 'Content-Type' for static files (js)?

For my Flask app I use 'windows-1251' encoding. To draw a template I set 'Content-Type' as follows:
from flask.helpers import make_response
def render_tmpl_dummy():
response = make_response("Some Russian text here")
response.headers['Content-Type'] = 'text/html; charset=windows-1251'
return response
And all fine here. But my static js-files also in 'windows-1251'. So, is there any way to set 'Content-Type=application/x-javascript; charset=windows-1251' for all static files?
(PS: I do not want to convert them to UTF-8 manually in advance, this method is not suitable for me)
This is how I was able to give all files with a custom extension in the static folder a custom MIME type on the development server:
if __name__ == '__main__':
CUSTOM_FILE_EXTENSION = '.jsonl'
CUSTOM_MIME_TYPE = 'application/jsonl+json'
#flask_app.route(flask_app.static_url_path + '/' + '<path:path>' + CUSTOM_FILE_EXTENSION)
def jsonl_mime_type(path):
return flask.send_from_directory(
directory=flask_app.static_folder,
path=path + CUSTOM_FILE_EXTENSION,
mimetype=CUSTOM_MIME_TYPE
)
In effect, I simply reinvented the Flask static web server, which apparently consists of a single call to add_url_rule() in the Flask constructor:
self.add_url_rule(
f"{self.static_url_path}/<path:filename>",
endpoint="static",
host=static_host,
view_func=lambda **kw: self_ref().send_static_file(**kw),
)
I went to all this trouble so the development server behaved more like production, where Apache has a gentler way to set MIME type.
Your static files shouldn't be served by the web server other than in development, so converting the file encoding is the correct method.
If your reason for not converting the files first is due to volume, see How to convert a file to utf-8 in Python? to see how to automate it.

Having Django serve downloadable files

I want users on the site to be able to download files whose paths are obscured so they cannot be directly downloaded.
For instance, I'd like the URL to be something like this: http://example.com/download/?f=somefile.txt
And on the server, I know that all downloadable files reside in the folder /home/user/files/.
Is there a way to make Django serve that file for download as opposed to trying to find a URL and View to display it?
For the "best of both worlds" you could combine S.Lott's solution with the xsendfile module: django generates the path to the file (or the file itself), but the actual file serving is handled by Apache/Lighttpd. Once you've set up mod_xsendfile, integrating with your view takes a few lines of code:
from django.utils.encoding import smart_str
response = HttpResponse(mimetype='application/force-download') # mimetype is replaced by content_type for django 1.7
response['Content-Disposition'] = 'attachment; filename=%s' % smart_str(file_name)
response['X-Sendfile'] = smart_str(path_to_file)
# It's usually a good idea to set the 'Content-Length' header too.
# You can also set any other required headers: Cache-Control, etc.
return response
Of course, this will only work if you have control over your server, or your hosting company has mod_xsendfile already set up.
EDIT:
mimetype is replaced by content_type for django 1.7
response = HttpResponse(content_type='application/force-download')
EDIT:
For nginx check this, it uses X-Accel-Redirect instead of apache X-Sendfile header.
A "download" is simply an HTTP header change.
See http://docs.djangoproject.com/en/dev/ref/request-response/#telling-the-browser-to-treat-the-response-as-a-file-attachment for how to respond with a download.
You only need one URL definition for "/download".
The request's GET or POST dictionary will have the "f=somefile.txt" information.
Your view function will simply merge the base path with the "f" value, open the file, create and return a response object. It should be less than 12 lines of code.
For a very simple but not efficient or scalable solution, you can just use the built in django serve view. This is excellent for quick prototypes or one-off work, but as has been mentioned throughout this question, you should use something like apache or nginx in production.
from django.views.static import serve
filepath = '/some/path/to/local/file.txt'
return serve(request, os.path.basename(filepath), os.path.dirname(filepath))
S.Lott has the "good"/simple solution, and elo80ka has the "best"/efficient solution. Here is a "better"/middle solution - no server setup, but more efficient for large files than the naive fix:
http://djangosnippets.org/snippets/365/
Basically, Django still handles serving the file but does not load the whole thing into memory at once. This allows your server to (slowly) serve a big file without ramping up the memory usage.
Again, S.Lott's X-SendFile is still better for larger files. But if you can't or don't want to bother with that, then this middle solution will gain you better efficiency without the hassle.
Just mentioning the FileResponse object available in Django 1.10
Edit: Just ran into my own answer while searching for an easy way to stream files via Django, so here is a more complete example (to future me). It assumes that the FileField name is imported_file
views.py
from django.views.generic.detail import DetailView
from django.http import FileResponse
class BaseFileDownloadView(DetailView):
def get(self, request, *args, **kwargs):
filename=self.kwargs.get('filename', None)
if filename is None:
raise ValueError("Found empty filename")
some_file = self.model.objects.get(imported_file=filename)
response = FileResponse(some_file.imported_file, content_type="text/csv")
# https://docs.djangoproject.com/en/1.11/howto/outputting-csv/#streaming-large-csv-files
response['Content-Disposition'] = 'attachment; filename="%s"'%filename
return response
class SomeFileDownloadView(BaseFileDownloadView):
model = SomeModel
urls.py
...
url(r'^somefile/(?P<filename>[-\w_\\-\\.]+)$', views.SomeFileDownloadView.as_view(), name='somefile-download'),
...
Tried #Rocketmonkeys solution but downloaded files were being stored as *.bin and given random names. That's not fine of course. Adding another line from #elo80ka solved the problem.
Here is the code I'm using now:
from wsgiref.util import FileWrapper
from django.http import HttpResponse
filename = "/home/stackoverflow-addict/private-folder(not-porn)/image.jpg"
wrapper = FileWrapper(file(filename))
response = HttpResponse(wrapper, content_type='text/plain')
response['Content-Disposition'] = 'attachment; filename=%s' % os.path.basename(filename)
response['Content-Length'] = os.path.getsize(filename)
return response
You can now store files in a private directory (not inside /media nor /public_html) and expose them via django to certain users or under certain circumstances.
Hope it helps.
Thanks to #elo80ka, #S.Lott and #Rocketmonkeys for the answers, got the perfect solution combining all of them =)
It was mentioned above that the mod_xsendfile method does not allow for non-ASCII characters in filenames.
For this reason, I have a patch available for mod_xsendfile that will allow any file to be sent, as long as the name is url encoded, and the additional header:
X-SendFile-Encoding: url
Is sent as well.
http://ben.timby.com/?p=149
Try: https://pypi.python.org/pypi/django-sendfile/
"Abstraction to offload file uploads to web-server (e.g. Apache with mod_xsendfile) once Django has checked permissions etc."
You should use sendfile apis given by popular servers like apache or nginx in production. For many years I was using the sendfile api of these servers for protecting files. Then created a simple middleware based django app for this purpose suitable for both development & production purposes. You can access the source code here.
UPDATE: in new version python provider uses django FileResponse if available and also adds support for many server implementations from lighthttp, caddy to hiawatha
Usage
pip install django-fileprovider
add fileprovider app to INSTALLED_APPS settings,
add fileprovider.middleware.FileProviderMiddleware to MIDDLEWARE_CLASSES settings
set FILEPROVIDER_NAME settings to nginx or apache in production, by default it is python for development purpose.
in your class-based or function views, set the response header X-File value to the absolute path of the file. For example:
def hello(request):
# code to check or protect the file from unauthorized access
response = HttpResponse()
response['X-File'] = '/absolute/path/to/file'
return response
django-fileprovider implemented in a way that your code will need only minimum modification.
Nginx configuration
To protect file from direct access you can set the configuration as
location /files/ {
internal;
root /home/sideffect0/secret_files/;
}
Here nginx sets a location url /files/ only access internaly, if you are using above configuration you can set X-File as:
response['X-File'] = '/files/filename.extension'
By doing this with nginx configuration, the file will be protected & also you can control the file from django views
def qrcodesave(request):
import urllib2;
url ="http://chart.apis.google.com/chart?cht=qr&chs=300x300&chl=s&chld=H|0";
opener = urllib2.urlopen(url);
content_type = "application/octet-stream"
response = HttpResponse(opener.read(), content_type=content_type)
response["Content-Disposition"]= "attachment; filename=aktel.png"
return response
Django recommend that you use another server to serve static media (another server running on the same machine is fine.) They recommend the use of such servers as lighttp.
This is very simple to set up. However. if 'somefile.txt' is generated on request (content is dynamic) then you may want django to serve it.
Django Docs - Static Files
Another project to have a look at: http://readthedocs.org/docs/django-private-files/en/latest/usage.html
Looks promissing, haven't tested it myself yet tho.
Basically the project abstracts the mod_xsendfile configuration and allows you to do things like:
from django.db import models
from django.contrib.auth.models import User
from private_files import PrivateFileField
def is_owner(request, instance):
return (not request.user.is_anonymous()) and request.user.is_authenticated and
instance.owner.pk = request.user.pk
class FileSubmission(models.Model):
description = models.CharField("description", max_length = 200)
owner = models.ForeignKey(User)
uploaded_file = PrivateFileField("file", upload_to = 'uploads', condition = is_owner)
I have faced the same problem more then once and so implemented using xsendfile module and auth view decorators the django-filelibrary. Feel free to use it as inspiration for your own solution.
https://github.com/danielsokolowski/django-filelibrary
Providing protected access to static html folder using https://github.com/johnsensible/django-sendfile: https://gist.github.com/iutinvg/9907731
I did a project on this. You can look at my github repo:
https://github.com/nishant-boro/django-rest-framework-download-expert
This module provides a simple way to serve files for download in django rest framework using Apache module Xsendfile. It also has an additional feature of serving downloads only to users belonging to a particular group

Categories

Resources