Log that a request was responded to after responding - python

This probably is a very noobish question, but I want to make sure my code is doing what I think it's doing.
Here's what I'm after - get a request, make a decision, respond to it the request with the decision, and only then log it. The sequence is important because writes can be slow and I want to make sure that a response is published before any writes take place.
Here's the sample code:
class ConferenceGreetingHandler(webapp.RequestHandler):
def get(self):
self.post()
def post(self):
xml_template(self, 'templates/confgreeting.xml')
new_log = Log()
new_log.log = 'test'
new_log.put()
I think I'm serving a response before logging, is this in fact true? Also, is there a better way to do this? Again, sorry for super-noobishness...
EDIT: Here's the template:
def xml_template(handler, page, values=None):
path = os.path.join(os.path.dirname(__file__), page)
handler.response.headers["Content-Type"] = "text/xml"
handler.response.out.write(template.render(path, values))

No matter what you do, App Engine will not send a response to a user until your handler code completes. There's currently no way, unfortunately, to tell App Engine "send the response now, I won't output any more".
You have a few options:
Just put the log entry synchronously. Datastore writes aren't hugely expensive with respect to wallclock latency, especially if you minimize the number of index updates needed.
Enqueue a task queue task to write the log data. If you use pull queues, you can fetch log entries and write them in batches to the datastore from another task or the backend.
Start the datastore write for the log entry as soon as you have the relevant data, and use an asynchronous operation, allowing you to overlap the write with some of your processing.

Much depends on what xml_template does. If it does a self.response.write(...), then the handler has done it's part to serve a response. The webapp framework does the rest once your handler completes normally.
I'm not sure what your "better way" question refers to, but two things stand out.
First, logger.warn("test") will write to the system log, rather than creating a Log instance that you have to (possibly) track down and delete later.
Second, if you're going to use xml_template widely, make it an instance method. Create your own subclass of webapp.RequestHandler, put xml_template there, and then subclass that for your specific handlers.
Updated: I overlooked the part about wanting to get the response out before doing writes. If you're suffering from slow writes, first look very carefully at whether the Entity being writing to is overindexed (indexed on fields that would never be queried against). If that wasn't enough to get performance into an acceptable range, the advice Nick lays out is the way to go.

Related

multi-thread application with queue, best approach to deliver each reply to the right caller?

Consider a multi-thread application, in which different pieces of code send commands to a background thread/service through a command queue, and consequently the service puts the replies in a reply queue. Is there a commonly accepted “strategy” for ensuring that a specific reply gets delivered to the rightful caller?
Coming to my specific case (a program in Python3), I was thinking about setting both the command and reply queues to maxsize=1, so that each caller can just put the command and wait for the reply (which will surely be its own), but this could potentially affect the performances of the application. Or else send a sort of unique code (a hash or similar) with the command, and have the background service include that same string in the reply, so that a caller can go through the replies, looking for its own reply and putting back the other replies in the queue. Honestly I don't like either of them. Is there something else that could be done?
I’m asking this because I’ve spent a fair amount of hours investigating online about threading, and reading through the official documentation, but I couldn’t make up my mind on this. I’m unsure which could be the right/best approach and most importantly I’d like to know if there is a mainstream approach to achieve this.
I don’t provide any code because the question deals with general application design.
Associating a unique identifier with each request is basically the standard solution to this problem.
This is the solution employed by protocols from various eras, from DNS to HTTP/2.
You can build whatever abstractions you like on top of it. Consider this semi-example using Twisted's Deferred:
def request(args):
uid = next(id_generator)
request_queue.put((uid, args))
result = waiting[uid] = Deferred()
return result
def process_responses():
uid, response = response_queue.get()
result = waiting.pop(uid)
result.callback(response)
#inlineCallbacks
def foo_doer():
foo = yield request(...)
# foo is the response from the response queue.
The basic mechanism is nothing more than unique-id-tagged items in the two queues. But the user isn't forced to track these UIDs. Instead, they get an easy-to-use abstraction that just gives them the result they want.

Prevent a race condition where a many requests may concurrent trigger a call which should run only once

So this isn't necessarily a Django question, I'm just having a mental block getting my head around the logic, but I suppose Django might provide some ways to manually lock records that would be helpful.
Essentially, a user may upload one or many files at a time. Each file is uploaded via a separate request. When the user goes above 90% storage quota, I'd like to send an email to them notifying them as such, but I only want to send a single email. So my current workflow is to check their usage, make sure they have not yet been sent a reminder, and :
if usage_pct >= settings.VDISK_HIGH_USAGE_THRESHOLD and disk.last_high_usage_reminder is None:
disk.last_high_usage_reminder = timezone.now()
disk.save()
vdisks_high_usage_send_notice(user)
The above code however often lets more than one email through. So my first thought is to somehow lock the disk record before even checking the value, and then unlock it after saving it. Is it possible and/or advisable, or is there a better method to this?
OK I'm quietly confident I've solved the problem using this answer: https://stackoverflow.com/a/7794220/698289
Firstly, implement this utility function:
#transaction.commit_manually
def flush_transaction():
transaction.commit()
And then modify my original code to flush and reload the disk record:
flush_transaction()
disk = user.profile.diskstorage
if usage_pct >= settings.VDISK_HIGH_USAGE_THRESHOLD and disk.last_high_usage_reminder is None:
disk.last_high_usage_reminder = timezone.now()
disk.save()
vdisks_high_usage_send_notice(user)

trigger function after returning HttpResponse from django view

I am developing a django webserver on which another machine (with a known IP) can upload a spreadsheet to my webserver. After the spreadsheet has been updated, I want to trigger some processing/validation/analysis on the spreadsheet (which can take >5 minutes --- too long for the other server to reasonably wait for a response) and then send the other machine (with a known IP) a HttpResponse indicating that the data processing is finished.
I realize that you can't do processing.data() after returning an HttpResponse, but functionally I want code that looks something like this:
# processing.py
def spreadsheet(*args, **kwargs):
print "[robot voice] processing spreadsheet........."
views.finished_processing_spreadsheet()
# views.py
def upload_spreadsheet(request):
print "save the spreadsheet somewhere"
return HttpResponse("started processing spreadsheet")
processing.data()
def finished_processing_spreadsheet():
print "send good news to other server (with known IP)"
I know how to write each function individually, but how can I effectively call processing.data() after views.upload_spreadsheet has returned a response?
I tried using django's request_finished signaling framework but this does not trigger the processing.spreadsheet() method after returning the HttpResponse. I tried using a decorator on views.upload_spreadsheet with the same problem.
I have an inkling that this might have something to do with writing middleware or possibly a custom class-based view, neither of which I have any experience with so I thought I would pose the question to the universe in search of some help.
Thanks for your help!
In fact Django have a syncronous model. If you want to do real async processing, you need a message queue. The most used with django is celery, it may look a bit "overkill" but it's a good answer.
Why do we need this? because in a wsgi app, apache give the request to the executable, and, the executable returns text. It's only once when the executable finish his execution that apache aknowledge the end of the request.
The problem with your implementation is that if the number of spreadsheets in process is equal to the number of workers: your website will not respond anymore.
You should use a background task queue, basically have 2 processes: your server and a background task manager. The server should delegate the processing of the spreadsheet to the background task manager. When the background task is done, it should inform the server somehow. For example, it can do model_with_spreadsheet.processed = datetime.datetime.now().
You should use a background job manager like django-ztask (very easy setup), celery (very powerful, probably overkill in your case) or even uwsgi spooler (which obviously requires uwsgi deployment).

Python Tornado - making POST return immediately while async function keeps working

so I have a handler below:
class PublishHandler(BaseHandler):
def post(self):
message = self.get_argument("message")
some_function(message)
self.write("success")
The problem that I'm facing is that some_function() takes some time to execute and I would like the post request to return straight away when called and for some_function() to be executed in another thread/process if possible.
I'm using berkeley db as the database and what I'm trying to do is relatively simple.
I have a database of users each with a filter. If the filter matches the message, the server will send the message to the user. Currently I'm testing with thousands of users and hence upon each publication of a message via a post request it's iterating through thousands of users to find a match. This is my naive implementation of doing things and hence my question. How do I do this better?
You might be able to accomplish this by using your IOLoop's add_callback method like so:
loop.add_callback(lambda: some_function(message))
Tornado will execute the callback in the next IOLoop pass, which may (I'd have to dig into Tornado's guts to know for sure, or alternatively test it) allow the request to complete before that code gets executed.
The drawback is that that long-running code you've written will still take time to execute, and this may end up blocking another request. That's not ideal if you have a lot of these requests coming in at once.
The more foolproof solution is to run it in a separate thread or process. The best way with Python is to use a process, due to the GIL (I'd highly recommend reading up on that if you're not familiar with it). However, on a single-processor machine the threaded implementation will work just as fine, and may be simpler to implement.
If you're going the threaded route, you can build a nice "async executor" module with a mutex, a thread, and a queue. Check out the multiprocessing module if you want to go the route of using a separate process.
I've tried this, and I believe the request does not complete before the callbacks are called.
I think a dirty hack would be to call two levels of add_callback, e.g.:
def get(self):
...
def _defered():
ioloop.add_callback(<whatever you want>)
ioloop.add_callback(_defered)
...
But these are hacks at best. I'm looking for a better solution right now, probably will end up with some message queue or simple thread solution.

Performing non-blocking requests? - Django

I have been playing with other frameworks, such as NodeJS, lately.
I love the possibility to return a response, and still being able to do further operations.
e.g.
def view(request):
do_something()
return HttpResponse()
do_more_stuff() #not possible!!!
Maybe Django already offers a way to perform operations after returning a request, if that is the case that would be great.
Help would be very much appreciated! =D
not out of the box as you've already returned out of the method. You could use something like Celery which would pass the do_more_stuff task onto a queue and then have it run do_more_stuff() outside of http request / response flow.
Django lets you accomplish this with Signals, more information can be found here. (Please note, as I said in comments below, signals aren't non-blocking, but they do allow you to execute code after returning a response in a view.)
If you're looking into doing many, many asynchronous requests and need them to be non-blocking, you may want to check out Tornado.
Because you're returning from the function, do_more_stuff will never be called.
If you're looking at doing heavy lifting stuff queuing up something before you return as Ross suggests (+1 for Celery).
if however you're looking at returning some content... then doing something and returning more content to the user streaming is probably what you're looking for. You can pass an iterator or a generator to HttpResponse, and it'll iterate and push out the content in a trickle fashion. It feels a bit yuck, but if you're a generator rockstar you may be able to do enough in various states to accomplish what you want.
Or I guess you could simply redesign your page to use a lot of ajax to do what you need, including firing off events to django views, reading data from views, etc.
It kind of comes down to where the burden of async is going to sit: client, server or response.
I'm not that familiar with node.js yet, but it would be interesting to see the use case you're talking about.
EDIT: I did a little more looking into signals, and while they do occur in process, there is a built in signal for request_finished after the request has been handled by django, though it's more of a catchall than something specific.

Categories

Resources