is there a cleanup phase in mod_wsgi? - python

I am trying to do some logging in Django (mod_wsgi) of a view. I however want to do this so that the client is not held up similar to the perlcleanuphandler phase available in mod_perl. Notice the line "It is used to execute some code immediately after the request has been served (the client went away)". This is exactly what I want.
I want to client to be serviced and then I want to do the logging. Is there a good insertion point for the code in mod_wsgi or Django ? I looked into suggestions here and here. However, in both cases when I put a simple time.sleep(10) and do a curl/wget on the url, the curl doesn't return for 10 secs.
I even tried to put the time.sleep in __del__ method in the HttpResponse Object as suggested in one of the comments, but still no dice.
I am aware that I can probably put the logging data onto a queue and do some backgroud processing to store the logs, but I would like to avoid that approach if there is an other simpler/easier approach.
Any suggestions ?

See documentation at:
http://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode
for a WSGI specific (not mod_wsgi specific) way.
Django as pointed out by others may have its own ways of doing things as well, although whether it is fired after all the response content is written back to the client I don't know.

Related

Python SimpleHTTPServer keeps going down and I don't know why

This is my first time working with SimpleHTTPServer, and honestly my first time working with web servers in general, and I'm having a frustrating problem. I'll start up my server (via SSH) and then I'll go try to access it and everything will be fine. But I'll come back a few hours later and the server won't be running anymore. And by that point the SSH session has disconnected, so I can't see if there were any error messages. (Yes, I know I should use something like screen to save the shell messages -- trying that right now, but I need to wait for it to go down again.)
I thought it might just be that my code was throwing an exception, since I had no error handling, but I added what should be a pretty catch-all try/catch block, and I'm still experiencing the issue. (I feel like this is probably not the best method of error handling, but I'm new at this... so let me know if there's a better way to do this)
class MyRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
# (this is the only function my request handler has)
def do_GET(self):
if 'search=' in self.path:
try:
# (my code that does stuff)
except Exception as e:
# (log the error to a file)
return
else:
SimpleHTTPServer.SimpleHTTPRequestHandler.do_GET(self)
Does anyone have any advice for things to check, or ways to diagnose the issue? Most likely, I guess, is that my code is just crashing somewhere else... but if there's anything in particular I should know about the way SimpleHTTPServer operates, let me know.
I never had SimpleHTTPServer running for an extended period of time usually I just use it to transfer a couple of files in an ad-hoc manner, but I guess that it wouldn't be so bad as long as your security restraints are elsewhere (ie firewall) and you don't have need for much scale.
The SSH session is ending, which is killing your tasks (both foreground and background tasks). There are two solutions to this:
Like you've already mentioned use a utility such as screen to prevent your session from ending.
If you really want this to run for an extended period of time, you should look into your operating system's documentation on how to start/stop/enable services (now-a-days most of the cool kids are using systemd, but you might also find yourself using SysVinit or some other init system)
EDIT:
This link is in the comments, but I thought I should put it here as it answers this question pretty well

Performing a blocking request in django view

In one of the views in my django application, I need to perform a relatively lengthy network IO operation. The problem is other requests must wait for this request to be completed even though they have nothing to do with it.
I did some research and stumbled upon Celery but as I understand, it is used to perform background tasks independent of the request. (so I can not use the result of the task for the response to the request)
Is there a way to process views asynchronously in django so while the network request is pending other requests can be processed?
Edit: What I forgot to mention is that my application is a web service using django rest framework. So the result of a view is a json response not a page that I can later modify using AJAX.
The usual solution here is to offload the task to celery, and return a "please wait" response in your view. If you want, you can then use an Ajax call to periodically hit a view that will report whether the response is ready, and redirect when it is.
You want to maintain that HTTP connection for an extended period of time but still allow other requests to be managed, right? There's no simple solution to this problem. Also, any solution will be a level away from Django as it depends on how you process requests.
I don't know what you're currently using, so I can only tell you how I handled this in the past... I was using uwsgi to provide the WSGI interface between my python application and nginx. In uwsgi I used the asynchronous functions to suspend my long running connection when there was time to wait on the IO connections. The methods allow you to ask it to suspend things until there is something to read or write and then allow other connections to be serviced.
The above mentioned async calls use "green threads". It's much lighter weight then regular threads and you have control over when you move from thread to thread.
I am not saying that it is a good solution for your scenario[1], but the simple answer is using the following pattern:
async_result = some_task.delay(arg1)
result = async_result.get()
Check documentation for the get method. And instead of using the delay method you can use anything that returns an AsyncResult (like the apply_async method
[1] Why it may be a bad idea? Having an ongoing connection waiting a lot is bad for Django (it is not ready for long-lived connections), may conflict with the proxy configuration (if there is a reverse proxy somewhere) and may be identified as a timeout from the browser. So... it seems a Bad Idea[TM] to use this pattern for a Django Rest Framework view.

Handling wsgi requests in my application

I've hit some conceptual road-block on something that should be simple.
I have a single-threaded, plain python application, which runs indefinitely.
I would like to be able to query information about the internal state of the application from the web, via an http request.
Most of the models I have seen for this, e.g. python's WSGIServer from flup.server.fcgi, require me to provide a callback function to handle a request. The callback then becomes the starting point of my program.
Instead of reworking my application's logic to fit this model of: request-->callback, I need to be able to hadnle requests within my own application's logic. For example, periodically throughout the execution of my application, I would check if there are any pending http/wsgi requests, handle them, and continue.
Do I have to open a sockets myself and deal with all the socket logic in order to achieve this? I don't necessarily mind, but I suspect I don't need to reinvent the wheel here.
Am I thinking about this incorrectly?

trigger function after returning HttpResponse from django view

I am developing a django webserver on which another machine (with a known IP) can upload a spreadsheet to my webserver. After the spreadsheet has been updated, I want to trigger some processing/validation/analysis on the spreadsheet (which can take >5 minutes --- too long for the other server to reasonably wait for a response) and then send the other machine (with a known IP) a HttpResponse indicating that the data processing is finished.
I realize that you can't do processing.data() after returning an HttpResponse, but functionally I want code that looks something like this:
# processing.py
def spreadsheet(*args, **kwargs):
print "[robot voice] processing spreadsheet........."
views.finished_processing_spreadsheet()
# views.py
def upload_spreadsheet(request):
print "save the spreadsheet somewhere"
return HttpResponse("started processing spreadsheet")
processing.data()
def finished_processing_spreadsheet():
print "send good news to other server (with known IP)"
I know how to write each function individually, but how can I effectively call processing.data() after views.upload_spreadsheet has returned a response?
I tried using django's request_finished signaling framework but this does not trigger the processing.spreadsheet() method after returning the HttpResponse. I tried using a decorator on views.upload_spreadsheet with the same problem.
I have an inkling that this might have something to do with writing middleware or possibly a custom class-based view, neither of which I have any experience with so I thought I would pose the question to the universe in search of some help.
Thanks for your help!
In fact Django have a syncronous model. If you want to do real async processing, you need a message queue. The most used with django is celery, it may look a bit "overkill" but it's a good answer.
Why do we need this? because in a wsgi app, apache give the request to the executable, and, the executable returns text. It's only once when the executable finish his execution that apache aknowledge the end of the request.
The problem with your implementation is that if the number of spreadsheets in process is equal to the number of workers: your website will not respond anymore.
You should use a background task queue, basically have 2 processes: your server and a background task manager. The server should delegate the processing of the spreadsheet to the background task manager. When the background task is done, it should inform the server somehow. For example, it can do model_with_spreadsheet.processed = datetime.datetime.now().
You should use a background job manager like django-ztask (very easy setup), celery (very powerful, probably overkill in your case) or even uwsgi spooler (which obviously requires uwsgi deployment).

Performing non-blocking requests? - Django

I have been playing with other frameworks, such as NodeJS, lately.
I love the possibility to return a response, and still being able to do further operations.
e.g.
def view(request):
do_something()
return HttpResponse()
do_more_stuff() #not possible!!!
Maybe Django already offers a way to perform operations after returning a request, if that is the case that would be great.
Help would be very much appreciated! =D
not out of the box as you've already returned out of the method. You could use something like Celery which would pass the do_more_stuff task onto a queue and then have it run do_more_stuff() outside of http request / response flow.
Django lets you accomplish this with Signals, more information can be found here. (Please note, as I said in comments below, signals aren't non-blocking, but they do allow you to execute code after returning a response in a view.)
If you're looking into doing many, many asynchronous requests and need them to be non-blocking, you may want to check out Tornado.
Because you're returning from the function, do_more_stuff will never be called.
If you're looking at doing heavy lifting stuff queuing up something before you return as Ross suggests (+1 for Celery).
if however you're looking at returning some content... then doing something and returning more content to the user streaming is probably what you're looking for. You can pass an iterator or a generator to HttpResponse, and it'll iterate and push out the content in a trickle fashion. It feels a bit yuck, but if you're a generator rockstar you may be able to do enough in various states to accomplish what you want.
Or I guess you could simply redesign your page to use a lot of ajax to do what you need, including firing off events to django views, reading data from views, etc.
It kind of comes down to where the burden of async is going to sit: client, server or response.
I'm not that familiar with node.js yet, but it would be interesting to see the use case you're talking about.
EDIT: I did a little more looking into signals, and while they do occur in process, there is a built in signal for request_finished after the request has been handled by django, though it's more of a catchall than something specific.

Categories

Resources