How to simultaneously upload and download file - python

I am trying to write a small Tornado server that lets users upload files using a HTML form, then give the link to someone else who then will simultaneously download the file while it is being uploaded.
For now the idea was that data would be some sort of Iterator that is created by the upload and then consumed by the download, however currently the entire file is being written into data.
I found a few people talking about chunked file uploads with Tornado, however couldn't find any reference pages for it.
import os
import tornado.web
import tornado.ioloop
settings = {'debug': True}
data = None
# assumes an <input type="file" name="file" />
class ShareHandler(tornado.web.RequestHandler):
def post(self, uri):
data = self.request.files['file'][0]['body']
class FetchHandler(tornado.web.RequestHandler):
def get(self, uri):
for line in data:
self.write(line)
handlers = [
(r'/share/(.*)', ShareHandler),
(r'/fetch/(.*)', FetchHandler),
]
application = tornado.web.Application(handlers, **settings)
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()

Tornado doesn't support streaming uploads. This is one of the most commonly asked questions about it. =) The maintainer is actively implementing the feature, watch here:
https://github.com/facebook/tornado/pull/1021

Related

FastAPI handle request time out

I have an API endpoint in which I am trying to processing the csv file and passing it to the scraper function after scraping is done I am downloading the scraped result as a csv file generated by scraper function, all functioning as I expected but when I deploy it on heroku after uploading the csv file after few seconds it occurs 503 response which means request timed out. So, I want to ask how can I handle the request time out error properly in a sense it won't crash and run till the scraper is running and file has been downloaded.
import fastapi as _fastapi
from fastapi.responses import HTMLResponse, FileResponse
import shutil
import os
from scraper import run_scraper
app = _fastapi.FastAPI()
#app.get("/")
def index():
content = """
<body>
<form method="post" action="/api/v1/scraped_csv" enctype="multipart/form-data">
<input name="csv_file" type="file" multiple>
<input type="submit">
</form>
</body>
"""
return HTMLResponse(content=content)
#app.post("/api/v1/scraped_csv")
async def extract_ads(csv_file: _fastapi.UploadFile = _fastapi.File(...)):
temp_file = _save_file_to_disk(csv_file, path="temp", save_as="temp")
await run_scraper(temp_file)
csv_path = os.path.abspath(clean_file)
return FileResponse(path=csv_path, media_type="text/csv", filename=clean_file)
def _save_file_to_disk(uploaded_file, path=".", save_as="default"):
extension = os.path.splitext(uploaded_file.filename)[-1]
temp_file = os.path.join(path, save_as + extension)
with open(temp_file, "wb") as buffer:
shutil.copyfileobj(uploaded_file.file, buffer)
return temp_file
Here is the link for the app.
https://fast-api-google-ads-scraper.herokuapp.com/
I currently see two possibilities.
1st, you increase the waiting time before the timeout.
If you use Gunicorn you can use -t INT or --timeout INT knowing that Value is a positive number or 0. Setting it to 0 has the effect of infinite timeouts by disabling timeouts for all workers entirely.
Second one, you use an asynchronous request/response. You respond immediately with a 202 and tell the client where he can track the status of the task but this requires creating a new endpoint, new logic, etc...
One possible solution would be what #fchancel is suggesting. Run scraping as a background task via a Redis Queue and inform the user that a job with job_id (key in redis) has been created. The worker dynos can store the results of the background job in a blob storage. You can fetch your results from the blob storage using job_id.
For knowing the status of the job, kindly have a look at this question

How to receive uploaded file with Klein like Flask in python

When setting up a Flask server, we can try to receive the file user uploaded by
imagefile = flask.request.files['imagefile']
filename_ = str(datetime.datetime.now()).replace(' ', '_') + \
werkzeug.secure_filename(imagefile.filename)
filename = os.path.join(UPLOAD_FOLDER, filename_)
imagefile.save(filename)
logging.info('Saving to %s.', filename)
image = exifutil.open_oriented_im(filename)
When I am looking at the Klein documentation, I've seen http://klein.readthedocs.io/en/latest/examples/staticfiles.html, however this seems like providing file from the webservice instead of receiving a file that's been uploaded to the web service. If I want to let my Klein server able to receive an abc.jpg and save it in the file system, is there any documentation that can guide me towards that objective?
As Liam Kelly commented, the snippets from this post should work. Using cgi.FieldStorage makes it possible to easily send file metadata without explicitly sending it. A Klein/Twisted approach would look something like this:
from cgi import FieldStorage
from klein import Klein
from werkzeug import secure_filename
app = Klein()
#app.route('/')
def formpage(request):
return '''
<form action="/images" enctype="multipart/form-data" method="post">
<p>
Please specify a file, or a set of files:<br>
<input type="file" name="datafile" size="40">
</p>
<div>
<input type="submit" value="Send">
</div>
</form>
'''
#app.route('/images', methods=['POST'])
def processImages(request):
method = request.method.decode('utf-8').upper()
content_type = request.getHeader('content-type')
img = FieldStorage(
fp = request.content,
headers = request.getAllHeaders(),
environ = {'REQUEST_METHOD': method, 'CONTENT_TYPE': content_type})
name = secure_filename(img[b'datafile'].filename)
with open(name, 'wb') as fileOutput:
# fileOutput.write(img['datafile'].value)
fileOutput.write(request.args[b'datafile'][0])
app.run('localhost', 8000)
For whatever reason, my Python 3.4 (Ubuntu 14.04) version of cgi.FieldStorage doesn't return the correct results. I tested this on Python 2.7.11 and it works fine. With that being said, you could also collect the filename and other metadata on the frontend and send them in an ajax call to klein. This way you won't have to do too much processing on the backend (which is usually a good thing). Alternatively, you could figure out how to use the utilities provided by werkzeug. The functions werkzeug.secure_filename and request.files (ie. FileStorage) aren't particularly difficult to implement or recreate.

App Engine Backends Configuration (python)

I got timeout error on my GAE server when it tries to send large files to an EC2 REST server. I found Backends Python API would be a good solution to my example but I had some problems on configuring it.
Following some instructions, I have added a simple backends.yaml in my project folder. But I still received the following error, which seems like I failed to create a backend instance.
File "\Google\google_appengine\google\appengine\api\background_thread\background_thread.py", line 84, in start_new_background_thread
raise ERROR_MAP[error.application_error](error.error_detail)
FrontendsNotSupported
Below is my code, and my question is:
Currently, I got timeout error in OutputPage.py, how do I let this script run on a backend instance?
update
Following Jimmy Kane's suggestions, I created a new script przm_batchmodel_backend.py for the backend instance. After staring my GAE, now it I have two ports (a default and a backend) running my site. Is that correct?
app.yaml
- url: /backend.html
script: przm_batchmodel.py
backends.yaml
backends:
- name: mybackend
class: B1
instances: 1
options: dynamic
OutputPage.py
from przm import przm_batchmodel
from google.appengine.api import background_thread
class OutputPage(webapp.RequestHandler):
def post(self):
form = cgi.FieldStorage()
thefile = form['upfile']
#this is the old way to initiate calculations
#html= przm_batchmodel.loop_html(thefile)
przm_batchoutput_backend.przmBatchOutputPageBackend(thefile)
self.response.out.write(html)
app = webapp.WSGIApplication([('/.*', OutputPage)], debug=True)
przm_batchmodel.py
def loop_html(thefile):
#parses uploaded csv and send its info. to the REST server, the returned value is a html page.
data= csv.reader(thefile.file.read().splitlines())
response = urlfetch.fetch(url=REST_server, payload=data, method=urlfetch.POST, headers=http_headers, deadline=60)
return response
przm_batchmodel_backend.py
class BakendHandler(webapp.RequestHandler):
def post(self):
t = background_thread.BackgroundThread(target=przm_batchmodel.loop_html, args=[thefile])
t.start()
app = webapp.WSGIApplication([('/backend.html', BakendHandler)], debug=True)
You need to create an application 'file'/script for the backend to work. Just like you do with the main.
So something like:
app.yaml
- url: /backend.html
script: przm_batchmodel.py
and on przm_batchmodel.py
class BakendHandler(webapp.RequestHandler):
def post(self):
html = 'test'
self.response.out.write(html)
app = webapp.WSGIApplication([('/backend.html', OutputPage)], debug=True)
May I also suggest using the new feature modules which are easier to setup?
Edit due to comment
Possible the setup was not your problem.
From the docs
Code running on a backend can start a background thread, a thread that
can "outlive" the request that spawns it. They allow backend instances
to perform arbitrary periodic or scheduled tasks or to continue
working in the background after a request has returned to the user.
You can only use backgroundthread on backends.
So edit again. Move the part of the code that is:
t = background_thread.BackgroundThread(target=przm_batchmodel.loop_html, args=[thefile])
t.start()
self.response.out.write(html)
To the backend app

Python server for streaming request body content

I am trying to create python intellectual proxy-server that should be able for streaming large request body content from client to the some internal storages (that may be amazon s3, swift, ftp or something like this). Before streaming server should requests some internal API server that determines parameters for uploading to internal storages. The main restriction is that it should be done in one HTTP operation with method PUT. Also it should work asynchronously because there will be a lot of file uploads.
What solution allows me to read chunks from upload content and starts streaming this chunks to internal storages befor user will have uploaded whole file? All python web applications that I know wait for a whole content will be received before give management to the wsgi applications/python web server.
One of the solutions that I found is tornado fork https://github.com/nephics/tornado . But it is unofficial and tornado developers don't hurry to include it into the main branch.
So may be you know some existing solutions for my problem? Tornado? Twisted? gevents?
Here's an example of a server that does streaming upload handling written with Twisted:
from twisted.internet import reactor
from twisted.internet.endpoints import serverFromString
from twisted.web.server import Request, Site
from twisted.web.resource import Resource
from twisted.application.service import Application
from twisted.application.internet import StreamServerEndpointService
# Define a Resource class that doesn't really care what requests are made of it.
# This simplifies things since it lets us mostly ignore Twisted Web's resource
# traversal features.
class StubResource(Resource):
isLeaf = True
def render(self, request):
return b""
class StreamingRequestHandler(Request):
def handleContentChunk(self, chunk):
# `chunk` is part of the request body.
# This method is called as the chunks are received.
Request.handleContentChunk(self, chunk)
# Unfortunately you have to use a private attribute to learn where
# the content is being sent.
path = self.channel._path
print "Server received %d more bytes for %s" % (len(chunk), path)
class StreamingSite(Site):
requestFactory = StreamingRequestHandler
application = Application("Streaming Upload Server")
factory = StreamingSite(StubResource())
endpoint = serverFromString(reactor, b"tcp:8080")
StreamServerEndpointService(endpoint, factory).setServiceParent(application)
This is a tac file (put it in streamingserver.tac and run twistd -ny streamingserver.tac).
Because of the need to use self.channel._path this isn't a completely supported approach. The API overall is pretty clunky as well so this is more an example that it's possible than that it's good. There has long been an intent to make this sort of thing easier (http://tm.tl/288) but it will probably be a long while yet before this is accomplished.
It seems I have a solution using gevent library and monkey patching:
from gevent.monkey import patch_all
patch_all()
from gevent.pywsgi import WSGIServer
def stream_to_internal_storage(data):
pass
def simple_app(environ, start_response):
bytes_to_read = 1024
while True:
readbuffer = environ["wsgi.input"].read(bytes_to_read)
if not len(readbuffer) > 0:
break
stream_to_internal_storage(readbuffer)
start_response("200 OK", [("Content-type", "text/html")])
return ["hello world"]
def run():
config = {'host': '127.0.0.1', 'port': 45000}
server = WSGIServer((config['host'], config['port']), application=simple_app)
server.serve_forever()
if __name__ == '__main__':
run()
It works well when I try to upload huge file:
curl -i -X PUT --progress-bar --verbose --data-binary #/path/to/huge/file "http://127.0.0.1:45000"

Having Django serve downloadable files

I want users on the site to be able to download files whose paths are obscured so they cannot be directly downloaded.
For instance, I'd like the URL to be something like this: http://example.com/download/?f=somefile.txt
And on the server, I know that all downloadable files reside in the folder /home/user/files/.
Is there a way to make Django serve that file for download as opposed to trying to find a URL and View to display it?
For the "best of both worlds" you could combine S.Lott's solution with the xsendfile module: django generates the path to the file (or the file itself), but the actual file serving is handled by Apache/Lighttpd. Once you've set up mod_xsendfile, integrating with your view takes a few lines of code:
from django.utils.encoding import smart_str
response = HttpResponse(mimetype='application/force-download') # mimetype is replaced by content_type for django 1.7
response['Content-Disposition'] = 'attachment; filename=%s' % smart_str(file_name)
response['X-Sendfile'] = smart_str(path_to_file)
# It's usually a good idea to set the 'Content-Length' header too.
# You can also set any other required headers: Cache-Control, etc.
return response
Of course, this will only work if you have control over your server, or your hosting company has mod_xsendfile already set up.
EDIT:
mimetype is replaced by content_type for django 1.7
response = HttpResponse(content_type='application/force-download')
EDIT:
For nginx check this, it uses X-Accel-Redirect instead of apache X-Sendfile header.
A "download" is simply an HTTP header change.
See http://docs.djangoproject.com/en/dev/ref/request-response/#telling-the-browser-to-treat-the-response-as-a-file-attachment for how to respond with a download.
You only need one URL definition for "/download".
The request's GET or POST dictionary will have the "f=somefile.txt" information.
Your view function will simply merge the base path with the "f" value, open the file, create and return a response object. It should be less than 12 lines of code.
For a very simple but not efficient or scalable solution, you can just use the built in django serve view. This is excellent for quick prototypes or one-off work, but as has been mentioned throughout this question, you should use something like apache or nginx in production.
from django.views.static import serve
filepath = '/some/path/to/local/file.txt'
return serve(request, os.path.basename(filepath), os.path.dirname(filepath))
S.Lott has the "good"/simple solution, and elo80ka has the "best"/efficient solution. Here is a "better"/middle solution - no server setup, but more efficient for large files than the naive fix:
http://djangosnippets.org/snippets/365/
Basically, Django still handles serving the file but does not load the whole thing into memory at once. This allows your server to (slowly) serve a big file without ramping up the memory usage.
Again, S.Lott's X-SendFile is still better for larger files. But if you can't or don't want to bother with that, then this middle solution will gain you better efficiency without the hassle.
Just mentioning the FileResponse object available in Django 1.10
Edit: Just ran into my own answer while searching for an easy way to stream files via Django, so here is a more complete example (to future me). It assumes that the FileField name is imported_file
views.py
from django.views.generic.detail import DetailView
from django.http import FileResponse
class BaseFileDownloadView(DetailView):
def get(self, request, *args, **kwargs):
filename=self.kwargs.get('filename', None)
if filename is None:
raise ValueError("Found empty filename")
some_file = self.model.objects.get(imported_file=filename)
response = FileResponse(some_file.imported_file, content_type="text/csv")
# https://docs.djangoproject.com/en/1.11/howto/outputting-csv/#streaming-large-csv-files
response['Content-Disposition'] = 'attachment; filename="%s"'%filename
return response
class SomeFileDownloadView(BaseFileDownloadView):
model = SomeModel
urls.py
...
url(r'^somefile/(?P<filename>[-\w_\\-\\.]+)$', views.SomeFileDownloadView.as_view(), name='somefile-download'),
...
Tried #Rocketmonkeys solution but downloaded files were being stored as *.bin and given random names. That's not fine of course. Adding another line from #elo80ka solved the problem.
Here is the code I'm using now:
from wsgiref.util import FileWrapper
from django.http import HttpResponse
filename = "/home/stackoverflow-addict/private-folder(not-porn)/image.jpg"
wrapper = FileWrapper(file(filename))
response = HttpResponse(wrapper, content_type='text/plain')
response['Content-Disposition'] = 'attachment; filename=%s' % os.path.basename(filename)
response['Content-Length'] = os.path.getsize(filename)
return response
You can now store files in a private directory (not inside /media nor /public_html) and expose them via django to certain users or under certain circumstances.
Hope it helps.
Thanks to #elo80ka, #S.Lott and #Rocketmonkeys for the answers, got the perfect solution combining all of them =)
It was mentioned above that the mod_xsendfile method does not allow for non-ASCII characters in filenames.
For this reason, I have a patch available for mod_xsendfile that will allow any file to be sent, as long as the name is url encoded, and the additional header:
X-SendFile-Encoding: url
Is sent as well.
http://ben.timby.com/?p=149
Try: https://pypi.python.org/pypi/django-sendfile/
"Abstraction to offload file uploads to web-server (e.g. Apache with mod_xsendfile) once Django has checked permissions etc."
You should use sendfile apis given by popular servers like apache or nginx in production. For many years I was using the sendfile api of these servers for protecting files. Then created a simple middleware based django app for this purpose suitable for both development & production purposes. You can access the source code here.
UPDATE: in new version python provider uses django FileResponse if available and also adds support for many server implementations from lighthttp, caddy to hiawatha
Usage
pip install django-fileprovider
add fileprovider app to INSTALLED_APPS settings,
add fileprovider.middleware.FileProviderMiddleware to MIDDLEWARE_CLASSES settings
set FILEPROVIDER_NAME settings to nginx or apache in production, by default it is python for development purpose.
in your class-based or function views, set the response header X-File value to the absolute path of the file. For example:
def hello(request):
# code to check or protect the file from unauthorized access
response = HttpResponse()
response['X-File'] = '/absolute/path/to/file'
return response
django-fileprovider implemented in a way that your code will need only minimum modification.
Nginx configuration
To protect file from direct access you can set the configuration as
location /files/ {
internal;
root /home/sideffect0/secret_files/;
}
Here nginx sets a location url /files/ only access internaly, if you are using above configuration you can set X-File as:
response['X-File'] = '/files/filename.extension'
By doing this with nginx configuration, the file will be protected & also you can control the file from django views
def qrcodesave(request):
import urllib2;
url ="http://chart.apis.google.com/chart?cht=qr&chs=300x300&chl=s&chld=H|0";
opener = urllib2.urlopen(url);
content_type = "application/octet-stream"
response = HttpResponse(opener.read(), content_type=content_type)
response["Content-Disposition"]= "attachment; filename=aktel.png"
return response
Django recommend that you use another server to serve static media (another server running on the same machine is fine.) They recommend the use of such servers as lighttp.
This is very simple to set up. However. if 'somefile.txt' is generated on request (content is dynamic) then you may want django to serve it.
Django Docs - Static Files
Another project to have a look at: http://readthedocs.org/docs/django-private-files/en/latest/usage.html
Looks promissing, haven't tested it myself yet tho.
Basically the project abstracts the mod_xsendfile configuration and allows you to do things like:
from django.db import models
from django.contrib.auth.models import User
from private_files import PrivateFileField
def is_owner(request, instance):
return (not request.user.is_anonymous()) and request.user.is_authenticated and
instance.owner.pk = request.user.pk
class FileSubmission(models.Model):
description = models.CharField("description", max_length = 200)
owner = models.ForeignKey(User)
uploaded_file = PrivateFileField("file", upload_to = 'uploads', condition = is_owner)
I have faced the same problem more then once and so implemented using xsendfile module and auth view decorators the django-filelibrary. Feel free to use it as inspiration for your own solution.
https://github.com/danielsokolowski/django-filelibrary
Providing protected access to static html folder using https://github.com/johnsensible/django-sendfile: https://gist.github.com/iutinvg/9907731
I did a project on this. You can look at my github repo:
https://github.com/nishant-boro/django-rest-framework-download-expert
This module provides a simple way to serve files for download in django rest framework using Apache module Xsendfile. It also has an additional feature of serving downloads only to users belonging to a particular group

Categories

Resources