I would like the GAE to do something else once my app has sent a response.
The handler would look like this:
class FooHandler(webapp.RequestHandler):
def post(self):
self.response.out.write('Bar')
send_response() # this is where I need help!
do_something_else() # at this point, the response should have been sent
In case you wonder why I try to do this:
I need thread-like behaviour, which is not allowed by GAE's sandboxed environment. So, a function sends several requests whithout caring about the response. Each request starts a time-consuming operation (fetching resources) and saves the result into the datastore, where it can be used by the first function.
Note: The request handler has to send a response. If you do not provide any, it will wait for the post function to complete and then return default headers (which is not the behaviour I'm looking for, of course)
If that can help, the solution might be to use a custom wsgi middeleware, but I have no idea how it works (yet)...
Maybe you can use the Task Queues.
As already mentioned, you can use task queues or the deferred API. Another option is outlined by Rafe Kaplan towards the end of his section in this talk here: you can do an asynchronous API call with a result hook function to process the result, and the result hook will be called when the call finishes, after the response is returned to the user!
Presuming you have access to the WSGI layer, you can wrap the WSGI application and provide a callback to be executed once response sent. For how to do this see:
http://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode
Although that is from the mod_wsgi documentation, the cleanup at end of request example should work for any WSGI compliant stack.
You can't GAE sends it's response when the RequestHandler returns if you actually need Threads you will need to write your web application at another hosting company.
Related
In one of the views in my django application, I need to perform a relatively lengthy network IO operation. The problem is other requests must wait for this request to be completed even though they have nothing to do with it.
I did some research and stumbled upon Celery but as I understand, it is used to perform background tasks independent of the request. (so I can not use the result of the task for the response to the request)
Is there a way to process views asynchronously in django so while the network request is pending other requests can be processed?
Edit: What I forgot to mention is that my application is a web service using django rest framework. So the result of a view is a json response not a page that I can later modify using AJAX.
The usual solution here is to offload the task to celery, and return a "please wait" response in your view. If you want, you can then use an Ajax call to periodically hit a view that will report whether the response is ready, and redirect when it is.
You want to maintain that HTTP connection for an extended period of time but still allow other requests to be managed, right? There's no simple solution to this problem. Also, any solution will be a level away from Django as it depends on how you process requests.
I don't know what you're currently using, so I can only tell you how I handled this in the past... I was using uwsgi to provide the WSGI interface between my python application and nginx. In uwsgi I used the asynchronous functions to suspend my long running connection when there was time to wait on the IO connections. The methods allow you to ask it to suspend things until there is something to read or write and then allow other connections to be serviced.
The above mentioned async calls use "green threads". It's much lighter weight then regular threads and you have control over when you move from thread to thread.
I am not saying that it is a good solution for your scenario[1], but the simple answer is using the following pattern:
async_result = some_task.delay(arg1)
result = async_result.get()
Check documentation for the get method. And instead of using the delay method you can use anything that returns an AsyncResult (like the apply_async method
[1] Why it may be a bad idea? Having an ongoing connection waiting a lot is bad for Django (it is not ready for long-lived connections), may conflict with the proxy configuration (if there is a reverse proxy somewhere) and may be identified as a timeout from the browser. So... it seems a Bad Idea[TM] to use this pattern for a Django Rest Framework view.
I understand that WSGI Middleware's purpose is to extend functionality between a request and a response.
But can some of this code be run after the response is returned?
I need to store a request/response log in an external database, and wouldn't want this to slow the response times down.
Thanks! :)
Did you consider spawning a new thread or using a queue manager?
This way you can return the view and process the data in the background.
This answer here has more information:
How to fork a process in python/django?
We're creating a web service (RESTful API) in Django. Our API will wrap both our own internal data as well as some other APIs that our web services layer will be accessing.
One of the APIs we're using has some long-running calls that don't return an HTTP response for on the order of a minute. The API has a separate API call to get status of the current operation, but that means that the user has to initiate the long-running operation, then have a separate process poll for status. We don't want our API to work that way, we want the initial request to just return a response that says that it's in progress.
So what we want to do is when we get a long-running request, we kick off an HTTP request of our own to the API asynchronously, then return a response. Then every time we get a status poll we just pass that through and respond with the response we got. When we get the callback that the operation is complete, then the next time we get a status poll we'll just respond that the operation is complete and return the data. This means that we'll need handlers for incoming status requests to check the list of in-progress long-running requests to respond with the status.
Does this seem like a reasonable way to approach this? Which python libraries we should be looking at to make this sort of thing easier? We're not sure whether to go with something low-level like eventlet or twisted, or something a little heavier-weight like celery. Celery seems to be the normal recommendation for this sort of thing, but I'm not 100% sure what its place would be.
Thanks,
Spencer
I faced the same situation a couple of months ago, probably you already solved your problem, but for other person facing the same situation I'll post what I did at that time.
Basically I used the http://www.celeryproject.org/ library, dispatching in a asynchronous way a long running operation returning a successful HTTP response the celery job id, the asynch operation would register the status and job id in a sqlite database (was enough for what I was doing), and a client was querying (using rest) the status of the job.
I am developing a django webserver on which another machine (with a known IP) can upload a spreadsheet to my webserver. After the spreadsheet has been updated, I want to trigger some processing/validation/analysis on the spreadsheet (which can take >5 minutes --- too long for the other server to reasonably wait for a response) and then send the other machine (with a known IP) a HttpResponse indicating that the data processing is finished.
I realize that you can't do processing.data() after returning an HttpResponse, but functionally I want code that looks something like this:
# processing.py
def spreadsheet(*args, **kwargs):
print "[robot voice] processing spreadsheet........."
views.finished_processing_spreadsheet()
# views.py
def upload_spreadsheet(request):
print "save the spreadsheet somewhere"
return HttpResponse("started processing spreadsheet")
processing.data()
def finished_processing_spreadsheet():
print "send good news to other server (with known IP)"
I know how to write each function individually, but how can I effectively call processing.data() after views.upload_spreadsheet has returned a response?
I tried using django's request_finished signaling framework but this does not trigger the processing.spreadsheet() method after returning the HttpResponse. I tried using a decorator on views.upload_spreadsheet with the same problem.
I have an inkling that this might have something to do with writing middleware or possibly a custom class-based view, neither of which I have any experience with so I thought I would pose the question to the universe in search of some help.
Thanks for your help!
In fact Django have a syncronous model. If you want to do real async processing, you need a message queue. The most used with django is celery, it may look a bit "overkill" but it's a good answer.
Why do we need this? because in a wsgi app, apache give the request to the executable, and, the executable returns text. It's only once when the executable finish his execution that apache aknowledge the end of the request.
The problem with your implementation is that if the number of spreadsheets in process is equal to the number of workers: your website will not respond anymore.
You should use a background task queue, basically have 2 processes: your server and a background task manager. The server should delegate the processing of the spreadsheet to the background task manager. When the background task is done, it should inform the server somehow. For example, it can do model_with_spreadsheet.processed = datetime.datetime.now().
You should use a background job manager like django-ztask (very easy setup), celery (very powerful, probably overkill in your case) or even uwsgi spooler (which obviously requires uwsgi deployment).
When I get a GET request from a user, I send them the response and then spend maybe a second logging stuff about that request. Is there a way to close the connection when I have the response ready, but continue doing that logging part, so that the user wouldn't have to wait for it to complete?
From the Google App Engine docs for the Response object:
App Engine does not support sending
data to the user's browser before
exiting the handler. Some web servers
use this technique to "stream" data to
the user's browser over a period of
time in response to a single request.
App Engine does not support this
streaming technique.
So there's no easy way. If you have a bundle of data that you can pass to a longer-running "process and log" method, try using the deferred library. Note that this will requiring bundling your data up and sending it to the task queue to do your processing and logging, so
you may not save much time, and
the results may not look much like you'd want - for example, you'd be logging from a different request, so might need to radically alter the logging
Still, you could try.
You have two options:
Use the Task Queue API. Enqueueing a task should be fast, so long as you have less than 10k of data (which is the limit on a Task Queue payload).
Use the 'sneaky' trick described by Rafe in this video to do processing after the response completes.