Django bootstrap/middleware/enter-exit - python

I have following problem. I want to add to django some kind of setup/teardown for each request. For example at the beginning of per user request I want to collect start data collection and at the end of request dump all data to database (1).
What comes to my mind right now, at the start of middleware instantiate an object (like singleton), every other part of the code can import this object, use its methods and then same middleware before returning response will scrap the object. The only concern I have is to be a threadsafe, so maybe create a global dict, and register keys that are build upon url + session_id hash or maybe request object id (internal python object id, maybe is good way to go?). At the end of request key will be scrapped from dict.
Any recommendations, thoughts, ideas?
(1) Please do not ask me why I cannot access DB directly or anything like this. This is only an example. I'm looking for general idea for something like enter and exit but request-response wise that can be imported in any place in a code and safely used.

In your middleware, you can create new object for data you want to maintain and put it in request.META dict. It will be available wherever, request is available. In this case, I don't think you need to worry about thread-safety as each request will create new object.
If you want to just create data object once when request processing starts, destroy after processing the request and no other code references this data then you can look at request_started and request_finished signals.

Middleware is very certainly not thread-safe. You should not store anything per-request either on the middleware object, or in the global namespace.
The usual way to do this is sort of thing to annotate it onto the request object. Middleware and views have access to this, but to get it anywhere else (eg in the model) you'll need to pass it around.

Related

Modifying quart.request - acceptable? (Python API)

I need to store some per-request metrics and telemetry (such as timestamps, etc) in quart (the python web framework). Is it acceptable behaviour to modify quart.request and add variables?
It appears to work, and it's similar to how I would have done it in Flask but I'm not sure if it is considered bad practice in Quart.
The background is that I want to store fine-grained telemetry (namely time stamps for when certain things happen inside a request) and not just the total request time.
Regards,
Niklas
Yea, best to extend the Request class and then assign this new class to the request_class attribute on the app object

Preserving value of variables between subsequent requests in Python Django

I have a Django application to log the character sequences from an autocomplete interface. Each time a call is made to the server, the parameters are added to a list and when the user submits the query, the list is written to a file.
Since I am not sure how to preserve the list between subsequent calls, I relied on a global variable say query_logger. Now I can preserve the list in the following way:
def log_query(query, completions, submitted=False):
global query_logger
if query_logger is None:
query_logger = list()
query_logger.append(query, completions, submitted)
if submitted:
query_logger = None
While this hack works for a single client sending requests I don't think this is a stable solution when requests come from multiple clients. My question is two-fold:
What is the order of execution of requests: Do they follow first come first serve (especially if the requests are asynchronous)?
What is a better approach for doing this?
If your django server is single-threaded, then yes, it will respond to requests as it receives them. If you're using wsgi or another proxy, that becomes more complicated. Regardless, I think you'll want to use a db to store the information.
I encountered a similar problem and ended up using sqlite to store the data temporarily, because that's super simple and easy to manage. You'll want to use IP addresses or create a unique ID passed as a url parameter in order to identify clients on subsequent requests.
I also scheduled a daily task (using cron on ubuntu) that goes through and removes any incomplete requests that haven't been completed (excluding those started in the last hour).
You must not use global variables for this.
The proper answer is to use the session - that is exactly what it is for.
Simplest (bad) solution would be to have a global variable. Which means you need some in memory location or a db to store this info

Does a JsonProperty deserialize only upon access?

In Google App Engine NDB, there is a property type JsonProperty which takes a Python list or dictionary and serializes it automatically.
The structure of my model depends on the answer to this question, so I want to know when exactly an object is deserialized? For example:
# a User model has a property "dictionary" which is of type JsonProperty
# will the following deserialize the dictionary?
object = User.get_by_id(someid)
# or will it not get deserialized until I actually access the dictionary?
val = object.dictionary['value']
ndb.JsonProperty follows the docs and does things the same way you would when defining a custom property: it defines make_value_from_datastore and get_value_for_datastore methods.
The documentation doesn't tell you when these methods get called, because it's up to the db implementation within the app engine to decide when to call these methods.
However, it's pretty likely they're going to get called whenever the model has to access the database. For example, from the documentation for get_value_for_datastore:
A property class can override this to use a different data type for the datastore than for the model instance, or to perform other data conversion just prior to storing the model instance.
If you really need to verify what's going on, you can provide your own subclass of JsonProperty like this:
class LoggingJsonProperty(ndb.JsonProperty):
def make_value_from_datastore(self, value):
with open('~/test.log', 'a') as logfile:
logfile.write('make_value_from_datastore called\n')
return super(LoggingJson, self).make_value_from_datastore(value)
You can log the JSON string, the backtrace, etc. if you want. And obviously you can use a standard logging function instead of sticking things in a separate log. But this should be enough to see what's happening.
Another option, of course, is to read the code, which I believe is in appengine/ext/db/__init__.py.
Since it's not documented, the details could change from one version to the next, so you'll have to either re-run your tests or re-read the code each time you upgrade, if you need to be 100% sure.
The correct answer is that it does indeed load the item lazily, upon access:
https://groups.google.com/forum/?fromgroups=#!topic/appengine-ndb-discuss/GaUSM7y4XhQ

Which one data load method is the best for perfomance?

For example, I have object user stored in database (Redis)
It has several fields:
String nick
String password
String email
List posts
List comments
Set followers
and so on...
In Python programm I have class (User) with same fields for this object. Instances of this class maps to object in database. The question is how to get data from DB for best performance:
Load values for each field on instance creating and initialize fields with it.
Load field value each time on field value requesting.
As second one but after value load replace field property by loaded value.
p.s. redis runs in localhost
The method entirely depends on the requirements.
If there is only one client reading and modifying the properties, this is a rather simple problem. When modifying data, just change the instance attributes in your current Python program and -- at the same time -- keep the DB in sync while keeping your program responsive. To that end, you should outsource blocking calls to another thread or make use of greenlets. If there is only one client, there definitely is no need to fetch a property from the DB on each value lookup.
If there are multiple clients reading the data and only one client modifying the data, you have to think about which level of synchronization you need. If you need 100 % synchronization, you will have to fetch data from the DB on each value lookup.
If there are multiple clients changing the data in the database you better look into a rock-solid industry standard solution rather than writing your own DB cache/mapper.
Your distinction between (2) and (3) does not really make sense. If you fetch data on every lookup, there is no need to 'store' data. You see, if there can be multiple clients involved these things quickly become quite complex and it's really hard to get it right.

django querysets + memcached: best practices

Trying to understand what happens during a django low-level cache.set()
Particularly, details about what part of the queryset gets stored in memcached.
First, am I interpreting the django docs correctly?
a queryset (python object) has/maintains its own cache
access to the database is lazy; even if the queryset.count is 1000,
if I do an object.get for 1 record, then the dbase will only be
accessed once, for that 1 record.
when accessing a django view via apache prefork MPM, everytime that
a particular daemon instance X ends up invoking a particular view that includes something
like "tournres_qset = TournamentResult.objects.all()",
this will then result, each time, in a new tournres_qset object
being created. That is, anything that may have been cached internally
by a tournres_qset python object from a previous (tcp/ip) visit,
is not used at all by a new request's tournres_qset.
Now the questions about saving things to memcached within the view.
Let's say I add something like this at the top of the view:
tournres_qset = cache.get('tournres', None)
if tournres_qset is None:
tournres_qset = TournamentResult.objects.all()
cache.set('tournres', tournres_qset, timeout)
# now start accessing tournres_qset
# ...
What gets stored during the cache.set()?
Does the whole queryset (python object) get serialized and saved?
Since the queryset hasn't been used yet to get any records, is this
just a waste of time, since no particular records' contents are actually
being saved in memcache? (Any future requests will get the queryset
object from memcache, which will always start fresh, with an empty local
queryset cache; access to the dbase will always occur.)
If the above is true, then should I just always re-save the queryset
at the end of the view, after it's been used throughout the vierw to access
some records, which will result in the queryset's local cache to get updated,
and which should always get re-saved to memcached? But then, this would always
result in once again serializing the queryset object.
So much for speeding things up.
Or, does the cache.set() force the queryset object to iterate and
access from the dbase all the records, which will also get saved in
memcache? Everything would get saved, even if the view only accesses
a subset of the query set?
I see pitfalls in all directions, which makes me think that I'm
misunderstanding a whole bunch of things.
Hope this makes sense and appreciate clarifications or pointers to some
"standard" guidelines. Thanks.
Querysets are lazy, which means they don't call the database until they're evaluated. One way they could get evaluated would be to serialize them, which is what cache.set does behind the scenes. So no, this isn't a waste of time: the entire contents of your Tournament model will be cached, if that's what you want. It probably isn't: and if you filter the queryset further, Django will just go back to the database, which would make the whole thing a bit pointless. You should just cache the model instances you actually need.
Note that the third point in your initial set isn't quite right, in that this has nothing to do with Apache or preforking. It's simply that a view is a function like any other, and anything defined in a local variable inside a function goes out of scope when that function returns. So a queryset defined and evaluated inside a view goes out of scope when the view returns the response, and a new one will be created the next time the view is called, ie on the next request. This is the case whichever way you are serving Django.
However, and this is important, if you do something like set your queryset to a global (module-level) variable, it will persist between requests. Most of the ways that Django is served, and this definitely includes mod_wsgi, keep a process alive for many requests before recycling it, so the value of the queryset will be the same for all of those requests. This can be useful as a sort of bargain-basement cache, but is difficult to get right because you have no idea how long the process will last, plus other processes are likely to be running in parallel which have their own versions of that global variable.
Updated to answer questions in the comment
Your questions show that you still haven't quite grokked how querysets work. It's all about when they are evaluated: if you list, or iterate, or slice a queryset, that evaluates it, and it's at that point the database call is made (I count serialization under iterating, here), and the results stored in the queryset's internal cache. So, if you've already done one of those things to your queryset, and then set it to the (external) cache, that won't cause another database hit.
But, every filter() operation on a queryset, even one that's already evaluated, is another database hit. That's because it's a modification of the underlying SQL query, so Django goes back to the database - and returns a new queryset, with its own internal cache.

Categories

Resources