I have been playing with other frameworks, such as NodeJS, lately.
I love the possibility to return a response, and still being able to do further operations.
e.g.
def view(request):
do_something()
return HttpResponse()
do_more_stuff() #not possible!!!
Maybe Django already offers a way to perform operations after returning a request, if that is the case that would be great.
Help would be very much appreciated! =D
not out of the box as you've already returned out of the method. You could use something like Celery which would pass the do_more_stuff task onto a queue and then have it run do_more_stuff() outside of http request / response flow.
Django lets you accomplish this with Signals, more information can be found here. (Please note, as I said in comments below, signals aren't non-blocking, but they do allow you to execute code after returning a response in a view.)
If you're looking into doing many, many asynchronous requests and need them to be non-blocking, you may want to check out Tornado.
Because you're returning from the function, do_more_stuff will never be called.
If you're looking at doing heavy lifting stuff queuing up something before you return as Ross suggests (+1 for Celery).
if however you're looking at returning some content... then doing something and returning more content to the user streaming is probably what you're looking for. You can pass an iterator or a generator to HttpResponse, and it'll iterate and push out the content in a trickle fashion. It feels a bit yuck, but if you're a generator rockstar you may be able to do enough in various states to accomplish what you want.
Or I guess you could simply redesign your page to use a lot of ajax to do what you need, including firing off events to django views, reading data from views, etc.
It kind of comes down to where the burden of async is going to sit: client, server or response.
I'm not that familiar with node.js yet, but it would be interesting to see the use case you're talking about.
EDIT: I did a little more looking into signals, and while they do occur in process, there is a built in signal for request_finished after the request has been handled by django, though it's more of a catchall than something specific.
Related
I am working on development of a system that collect data from rest servers and manipulates it.
One of the requirements is multiple and frequent API requests. We currently implement this in a somewhat synchronous way. I can easily implement this using threads but considering the system might need to support thousands of requests per second I think it would be wise to utilize Twisted's ability to efficiently implement the above. I have seen this blog post and the whole idea of deferred list seems to do the trick. But I am kind of stuck with how to structure my class (can't wrap my mind around how Twisted works).
Can you try to outline the structure of the class that will run the event-loop and will be able to get a list of URLs and headers and return a list of results after making the requests asynchronously?
Do you know of a better way of implementing this in python?
Sounds like you want to use a Twisted project called treq which allows you to send requests to HTTP endpoints. It works alot like requests. I recently helped a friend here in this thread. My answer there might be of some use to you. If you still need more help, just make a comment and I'll try my best to update this answer.
In one of the views in my django application, I need to perform a relatively lengthy network IO operation. The problem is other requests must wait for this request to be completed even though they have nothing to do with it.
I did some research and stumbled upon Celery but as I understand, it is used to perform background tasks independent of the request. (so I can not use the result of the task for the response to the request)
Is there a way to process views asynchronously in django so while the network request is pending other requests can be processed?
Edit: What I forgot to mention is that my application is a web service using django rest framework. So the result of a view is a json response not a page that I can later modify using AJAX.
The usual solution here is to offload the task to celery, and return a "please wait" response in your view. If you want, you can then use an Ajax call to periodically hit a view that will report whether the response is ready, and redirect when it is.
You want to maintain that HTTP connection for an extended period of time but still allow other requests to be managed, right? There's no simple solution to this problem. Also, any solution will be a level away from Django as it depends on how you process requests.
I don't know what you're currently using, so I can only tell you how I handled this in the past... I was using uwsgi to provide the WSGI interface between my python application and nginx. In uwsgi I used the asynchronous functions to suspend my long running connection when there was time to wait on the IO connections. The methods allow you to ask it to suspend things until there is something to read or write and then allow other connections to be serviced.
The above mentioned async calls use "green threads". It's much lighter weight then regular threads and you have control over when you move from thread to thread.
I am not saying that it is a good solution for your scenario[1], but the simple answer is using the following pattern:
async_result = some_task.delay(arg1)
result = async_result.get()
Check documentation for the get method. And instead of using the delay method you can use anything that returns an AsyncResult (like the apply_async method
[1] Why it may be a bad idea? Having an ongoing connection waiting a lot is bad for Django (it is not ready for long-lived connections), may conflict with the proxy configuration (if there is a reverse proxy somewhere) and may be identified as a timeout from the browser. So... it seems a Bad Idea[TM] to use this pattern for a Django Rest Framework view.
I am trying to do some logging in Django (mod_wsgi) of a view. I however want to do this so that the client is not held up similar to the perlcleanuphandler phase available in mod_perl. Notice the line "It is used to execute some code immediately after the request has been served (the client went away)". This is exactly what I want.
I want to client to be serviced and then I want to do the logging. Is there a good insertion point for the code in mod_wsgi or Django ? I looked into suggestions here and here. However, in both cases when I put a simple time.sleep(10) and do a curl/wget on the url, the curl doesn't return for 10 secs.
I even tried to put the time.sleep in __del__ method in the HttpResponse Object as suggested in one of the comments, but still no dice.
I am aware that I can probably put the logging data onto a queue and do some backgroud processing to store the logs, but I would like to avoid that approach if there is an other simpler/easier approach.
Any suggestions ?
See documentation at:
http://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode
for a WSGI specific (not mod_wsgi specific) way.
Django as pointed out by others may have its own ways of doing things as well, although whether it is fired after all the response content is written back to the client I don't know.
This probably is a very noobish question, but I want to make sure my code is doing what I think it's doing.
Here's what I'm after - get a request, make a decision, respond to it the request with the decision, and only then log it. The sequence is important because writes can be slow and I want to make sure that a response is published before any writes take place.
Here's the sample code:
class ConferenceGreetingHandler(webapp.RequestHandler):
def get(self):
self.post()
def post(self):
xml_template(self, 'templates/confgreeting.xml')
new_log = Log()
new_log.log = 'test'
new_log.put()
I think I'm serving a response before logging, is this in fact true? Also, is there a better way to do this? Again, sorry for super-noobishness...
EDIT: Here's the template:
def xml_template(handler, page, values=None):
path = os.path.join(os.path.dirname(__file__), page)
handler.response.headers["Content-Type"] = "text/xml"
handler.response.out.write(template.render(path, values))
No matter what you do, App Engine will not send a response to a user until your handler code completes. There's currently no way, unfortunately, to tell App Engine "send the response now, I won't output any more".
You have a few options:
Just put the log entry synchronously. Datastore writes aren't hugely expensive with respect to wallclock latency, especially if you minimize the number of index updates needed.
Enqueue a task queue task to write the log data. If you use pull queues, you can fetch log entries and write them in batches to the datastore from another task or the backend.
Start the datastore write for the log entry as soon as you have the relevant data, and use an asynchronous operation, allowing you to overlap the write with some of your processing.
Much depends on what xml_template does. If it does a self.response.write(...), then the handler has done it's part to serve a response. The webapp framework does the rest once your handler completes normally.
I'm not sure what your "better way" question refers to, but two things stand out.
First, logger.warn("test") will write to the system log, rather than creating a Log instance that you have to (possibly) track down and delete later.
Second, if you're going to use xml_template widely, make it an instance method. Create your own subclass of webapp.RequestHandler, put xml_template there, and then subclass that for your specific handlers.
Updated: I overlooked the part about wanting to get the response out before doing writes. If you're suffering from slow writes, first look very carefully at whether the Entity being writing to is overindexed (indexed on fields that would never be queried against). If that wasn't enough to get performance into an acceptable range, the advice Nick lays out is the way to go.
so I have a handler below:
class PublishHandler(BaseHandler):
def post(self):
message = self.get_argument("message")
some_function(message)
self.write("success")
The problem that I'm facing is that some_function() takes some time to execute and I would like the post request to return straight away when called and for some_function() to be executed in another thread/process if possible.
I'm using berkeley db as the database and what I'm trying to do is relatively simple.
I have a database of users each with a filter. If the filter matches the message, the server will send the message to the user. Currently I'm testing with thousands of users and hence upon each publication of a message via a post request it's iterating through thousands of users to find a match. This is my naive implementation of doing things and hence my question. How do I do this better?
You might be able to accomplish this by using your IOLoop's add_callback method like so:
loop.add_callback(lambda: some_function(message))
Tornado will execute the callback in the next IOLoop pass, which may (I'd have to dig into Tornado's guts to know for sure, or alternatively test it) allow the request to complete before that code gets executed.
The drawback is that that long-running code you've written will still take time to execute, and this may end up blocking another request. That's not ideal if you have a lot of these requests coming in at once.
The more foolproof solution is to run it in a separate thread or process. The best way with Python is to use a process, due to the GIL (I'd highly recommend reading up on that if you're not familiar with it). However, on a single-processor machine the threaded implementation will work just as fine, and may be simpler to implement.
If you're going the threaded route, you can build a nice "async executor" module with a mutex, a thread, and a queue. Check out the multiprocessing module if you want to go the route of using a separate process.
I've tried this, and I believe the request does not complete before the callbacks are called.
I think a dirty hack would be to call two levels of add_callback, e.g.:
def get(self):
...
def _defered():
ioloop.add_callback(<whatever you want>)
ioloop.add_callback(_defered)
...
But these are hacks at best. I'm looking for a better solution right now, probably will end up with some message queue or simple thread solution.