Avoiding multiple queries for multiple counts

Avoiding multiple queries for multiple counts - python

I'm trying to figure out a good way to get a few analytics counts from my DB without doing a bunch of queries and somehow doing one
What I have right now is a function that returns counts
def get_counts(self):
return {
'item_one_counts' : self.items_one.count(),
'item_two_counts' : self.items_two.count(),
'item_three_count' : self.items_three.count(),
}
etc.
I know I can do this with a raw query that does a SELECT as count1,2,3 FROM table X
Is there a more django-y way to do this?

You're a tad late if you want to get the counts in an instance method. The easiest way to optimize this is by using annotations in the initial query:
obj = MyModel.objects.annotate(item_one_count=Count('items_one')) \
.annotate(item_two_count=Count('items_two')) \
.annotate(item_three_count=Count('items_three')) \
.get(...)
Another good optimization is to cache the results, e.g.:
MyModel(models.Model):
def get_item_one_count(self):
if not hasattr(self, '_item_one_count'):
self._item_one_count = self.items_one.count()
return self._item_one_count
...
def get_counts(self):
return {
'item_one_counts' : self.get_item_one_count(),
'item_two_counts' : self.get_item_two_count(),
'item_three_count' : self.get_item_three_count(),
}
Combine these methods (i.e. .annotate(_item_one_count=Count('items_one'))), and you can optimize the counts into a single query when you have control over the query, while having fallback method in case you can't annotate the results.
Another option is to perform the annotation in your model manager, but you will no longer have fine-grained control over the queries.

Related

AttributeError: 'QuerySet' object has no attribute 'is_staff' [duplicate]

I was having a debate on this with some colleagues. Is there a preferred way to retrieve an object in Django when you're expecting only one?
The two obvious ways are:
try:
obj = MyModel.objects.get(id=1)
except MyModel.DoesNotExist:
# We have no object! Do something...
pass
And:
objs = MyModel.objects.filter(id=1)
if len(objs) == 1:
obj = objs[0]
else:
# We have no object! Do something...
pass
The first method seems behaviorally more correct, but uses exceptions in control flow which may introduce some overhead. The second is more roundabout but won't ever raise an exception.
Any thoughts on which of these is preferable? Which is more efficient?

get() is provided specifically for this case. Use it.
Option 2 is almost precisely how the get() method is actually implemented in Django, so there should be no "performance" difference (and the fact that you're thinking about it indicates you're violating one of the cardinal rules of programming, namely trying to optimize code before it's even been written and profiled -- until you have the code and can run it, you don't know how it will perform, and trying to optimize before then is a path of pain).

You can install a module called django-annoying and then do this:
from annoying.functions import get_object_or_None
obj = get_object_or_None(MyModel, id=1)
if not obj:
#omg the object was not found do some error stuff

1 is correct. In Python an exception has equal overhead to a return. For a simplified proof you can look at this.
2 This is what Django is doing in the backend. get calls filter and raises an exception if no item is found or if more than one object is found.

I'm a bit late to the party, but with Django 1.6 there is the first() method on querysets.
https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.first
Returns the first object matched by the queryset, or None if there is no matching object. If the QuerySet has no ordering defined, then the queryset is automatically ordered by the primary key.
Example:
p = Article.objects.order_by('title', 'pub_date').first()
Note that first() is a convenience method, the following code sample is equivalent to the above example:
try:
p = Article.objects.order_by('title', 'pub_date')[0]
except IndexError:
p = None

Why do all that work? Replace 4 lines with 1 builtin shortcut. (This does its own try/except.)
from django.shortcuts import get_object_or_404
obj = get_object_or_404(MyModel, id=1)

I can't speak with any experience of Django but option #1 clearly tells the system that you are asking for 1 object, whereas the second option does not. This means that option #1 could more easily take advantage of cache or database indexes, especially where the attribute you're filtering on is not guaranteed to be unique.
Also (again, speculating) the second option may have to create some sort of results collection or iterator object since the filter() call could normally return many rows. You'd bypass this with get().
Finally, the first option is both shorter and omits the extra temporary variable - only a minor difference but every little helps.

Some more info about exceptions. If they are not raised, they cost almost nothing. Thus if you know you are probably going to have a result, use the exception, since using a conditional expression you pay the cost of checking every time, no matter what. On the other hand, they cost a bit more than a conditional expression when they are raised, so if you expect not to have a result with some frequency (say, 30% of the time, if memory serves), the conditional check turns out to be a bit cheaper.
But this is Django's ORM, and probably the round-trip to the database, or even a cached result, is likely to dominate the performance characteristics, so favor readability, in this case, since you expect exactly one result, use get().

I've played with this problem a bit and discovered that the option 2 executes two SQL queries, which for such a simple task is excessive. See my annotation:
objs = MyModel.objects.filter(id=1) # This does not execute any SQL
if len(objs) == 1: # This executes SELECT COUNT(*) FROM XXX WHERE filter
obj = objs[0] # This executes SELECT x, y, z, .. FROM XXX WHERE filter
else:
# we have no object! do something
pass
An equivalent version that executes a single query is:
items = [item for item in MyModel.objects.filter(id=1)] # executes SELECT x, y, z FROM XXX WHERE filter
count = len(items) # Does not execute any query, items is a standard list.
if count == 0:
return None
return items[0]
By switching to this approach, I was able to substantially reduce number of queries my application executes.

.get()
Returns the object matching the given lookup parameters, which should
be in the format described in Field lookups.
get() raises MultipleObjectsReturned if more than one object was
found. The MultipleObjectsReturned exception is an attribute of the
model class.
get() raises a DoesNotExist exception if an object wasn't found for
the given parameters. This exception is also an attribute of the model
class.
.filter()
Returns a new QuerySet containing objects that match the given lookup
parameters.
Note
use get() when you want to get a single unique object, and filter()
when you want to get all objects that match your lookup parameters.

Interesting question, but for me option #2 reeks of premature optimisation. I'm not sure which is more performant, but option #1 certainly looks and feels more pythonic to me.

I suggest a different design.
If you want to perform a function on a possible result, you could derive from QuerySet, like this: http://djangosnippets.org/snippets/734/
The result is pretty awesome, you could for example:
MyModel.objects.filter(id=1).yourFunction()
Here, filter returns either an empty queryset or a queryset with a single item. Your custom queryset functions are also chainable and reusable. If you want to perform it for all your entries: MyModel.objects.all().yourFunction().
They are also ideal to be used as actions in the admin interface:
def yourAction(self, request, queryset):
queryset.yourFunction()

Option 1 is more elegant, but be sure to use try..except.
From my own experience I can tell you that sometimes you're sure there cannot possibly be more than one matching object in the database, and yet there will be two... (except of course when getting the object by its primary key).

Sorry to add one more take on this issue, but I am using the django paginator, and in my data admin app, the user is allowed to pick what to query on. Sometimes that is the id of a document, but otherwise it is a general query returning more than one object, i.e., a Queryset.
If the user queries the id, I can run:
Record.objects.get(pk=id)
which throws an error in django's paginator, because it is a Record and not a Queryset of Records.
I need to run:
Record.objects.filter(pk=id)
Which returns a Queryset with one item in it. Then the paginator works just fine.

".get()" can return one object:
{
"name": "John",
"age": "26",
"gender": "Male"
}
".filter()" can return **a list(set) of one or more objects:
[
{
"name": "John",
"age": "26",
"gender": "Male"
},
{
"name": "Tom",
"age": "18",
"gender": "Male"
},
{
"name": "Marry",
"age": "22",
"gender": "Female"
}
]

Django, using "|": Expression tree is too large (maximum depth 1000)

I'm trying to concatenate many querysets together. I tried out the marked answer from this question a while back, but that didn't work in my case. I needed to return a queryset not a list. So I used the |, from the second answer. This worked fine at the time, but now that I'm trying to use it again for something else I get the following error:
Expression tree is too large (maximum depth 1000)
I originally thought that | would concat the querysets, but after reading the docs it appears that it concats the actual query. And that this specific problem occurs if the query becomes too long/complex.
This is what I'm trying to do:
def properties(self, request, pk=None):
project = self.get_object()
if project is None:
return Response({'detail': 'Missing project id'}, status=404)
functions = Function.objects.filter(project=project)
properties = Property.objects.none()
for function in functions:
properties = properties | function.property_set.all()
return Response([PropertySerializer(x).data for x in properties])
Since the functions query returns roughly 1200 results, and each function has about 5 properties, I can understand the query becoming too long/complex.
How can I prevent the query from becoming too complex? Or how can I execute multiple queries and concat them afterwards, while keeping the end result a queryset?

I think you want to obtain all the Property objects that have as Function a certain project.
We can query this with:
properties = Property.objects.filter(function__project=project)
This thus is a queryset that contains all property objects for which the function (I assume this is a ForeignKey) has as project (probably again a ForeignKey is the given project). This will result in a single query as well, but you will avoid constructing gigantic unions.
Alternatively, you can do it in two steps, but this would actually make it slower:
# probably less efficient
function_ids = (Function.objects.filter(project=project)
.values_list('pk', flat=True))
properties = Properties.object(function_id__in=function_ids)

Provide a hint in bulk upserts

Is there a way to provide a hint for an upsert in a bulk in MongoDB / Python?
I would like to add a hint in a query like: Bulk.find(<query>).upsert().update(<update>).
I have tried:
Bulk.find(<query>).hint(<index>).upsert().update(<update>): .hint() method does not exist.
Bulk.find({'$query': <query>, '$hint': <hint>}).upsert().update(<update>): one cannot mix {$query: <query>} syntax with method chaining (see this & this for example).
Am I missing something?

This is not so much about Bulk Operations but is rather about the general behavior of queries in "update" statements. See SERVER-1599.
So the same format of operations supported by the basic Op_Query which is linked to .find() has never been supported in update statements. This is also true of the Bulk API because the .find() method there is it's own method and belongs to the Bulk API where it is not related to the basic collection method, hence the lacking .hint() method.
So using the special forms as with $query does not work even with .update() in a basic form. But there is something you can do as of MongoDB 2.6 to influence the index chosen by the query.
The new addition here is "index filters", this allows you to set up a list of indexes to be considered for a given "query shape". The main definition here is through the planCacheSetFilter command. This allows you do do something like the following ( just in shell for brevity ):
db.junk.ensureIndex({ "b": 1, "a": 1 })
db.runCommand({
"planCacheSetFilter": "junk",
"query": { "a": 1 },
"indexes": [
{ "b": 1, "a": 1 }
]
})
The values provided in the "query" argument there are irrelevant, but what is important is the "shape". So regardless of what data is being queried for, as long as the "shape" is basically the same then the filter set is considered. i.e:
db.junk.find({ "a": 1 }).explain(1).filterSet; // returns true
db.junk.find({ "a": 2 }).explain(1).filterSet; // returns true
db.junk.find({ "b": 1 }).explain(1).filterSet; // returns false, different shape
Unlike the direct form of $hint, this will work with both .update() statements or in the Bulk .find().update() chain as a way to provide an index choice for the query operation.
Beware though that this is not a "permanent" setting, nor is it able to be isolated to a singular operation or sequence of operations. This "filter" will stay in the plan cache once set until the server instance is restarted. You can alternately clear it with the planCacheClearFilters command.
So until that JIRA Issue is resolved, "filters" are the only possible way like what you are asking to achieve without factoring in other queries to narrow down additional filtering parameters to optimize on the likely selected index.

sqlalchemy, use a callable in pre-defined query filter

I have a lot of entities that I would like to filter in the same way, and several similar criterias. I would like to prepare these criterias and then just apply them when I need it.
I could write something like:
filter = Entity.owner==some_user
and then query:
query = session.query(Entity).... #some more
query = query.filter(filter)
That's OK when some_user is predifined. Now I need to pass a callable there, so it will be evaluated at actual query build time: say, replace some_user variable with a result of get_current_user() call.
You may notice that these tree ways will not go:
filter = Entity.owner==get_current_user
filter = Entity.owner==get_current_user()
filter = Entity.owner==lambda: get_current_user()
How do I do it?

If you are OK to evalute them just before you build the query - which should serve the purpose of storing them just fine - you could store them in callable:
filter = lambda: Entity.owner == get_current_user()
query = session.query(Entity).... #some more
query = query.filter(filter())
You can store expressions of any complexity this way. Use regular functions for more verbose cases.

Duplicate an AppEngine Query object to create variations of a filter without affecting the base query

In my AppEngine project I have a need to use a certain filter as a base then apply various different extra filters to the end, retrieving the different result sets separately. e.g.:
base_query = MyModel.all().filter('mainfilter', 123)
Then I need to use the results of various sub queries separately:
subquery1 = basequery.filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = basequery.filter('subfilter2', 'abc')
#Do something with subquery2 results here
Unfortunately 'filter()' affects the state of the basequery Query instance, rather than just returning a modified version. Is there any way to duplicate the Query object and use it as a base? Is there perhaps a standard Python way of duping an object that could be used?
The extra filters are actually applied by the results of different forms dynamically within a wizard, and they use the 'running total' of the query in their branch to assess whether to ask further questions.
Obviously I could pass around a rudimentary stack of filter criteria, but I'd rather use the Query itself if possible, as it adds simplicity and elegance to the solution.

There's no officially approved (Eg, not likely to break) way to do this. Simply creating the query afresh from the parameters when you need it is your best option.

As Nick has said, you better create the query again, but you can still avoid repeating yourself. A good way to do that would be like this:
#inside a request handler
def create_base_query():
return MyModel.all().filter('mainfilter', 123)
subquery1 = create_base_query().filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = create_base_query().filter('subfilter2', 'abc')
#Do something with subquery2 results here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoiding multiple queries for multiple counts - python

Related

AttributeError: 'QuerySet' object has no attribute 'is_staff' [duplicate]

Django, using "|": Expression tree is too large (maximum depth 1000)

Provide a hint in bulk upserts

sqlalchemy, use a callable in pre-defined query filter

Duplicate an AppEngine Query object to create variations of a filter without affecting the base query

Categories

Resources