Django - Optimal way to sort models by boolean operation

Django - Optimal way to sort models by boolean operation - python

Let's imagine I have this model and I would like to sort them by logical operation n1 != n2:
class Thing(Model):
n1 = IntegerField()
n2 = IntegerField()
...
def is_different(self):
return self.n1 != self.n2
If I sort them by sorted built-in function, I found that it does not return a Queryset, but a list:
things = Thing.objects.all()
sorted_things = sorted(things, key=lambda x: x.is_different())
Now, if I use annotate
sorted_things = things.annotate(diff=(F('n1') != F('n2'))).order_by('diff')
it raises the following error: AttributeError: 'bool' object has no attribute 'resolve_expression'.
I found a solution using extra queryset:
sorted_things = things.extra(select={'diff': 'n1!=n2'}).order_by('diff')
but following Django docs (https://docs.djangoproject.com/en/2.0/ref/models/querysets/#extra):
Use this method as a last resort
This is an old API that we aim to deprecate at some point in the future. Use it only if you cannot express your query using other queryset methods. If you do need to use it, please file a ticket using the QuerySet.extra keyword with your use case (please check the list of existing tickets first) so that we can enhance the QuerySet API to allow removing extra(). We are no longer improving or fixing bugs for this method.
Then, what is the optimal way to do it?
Thanks!

Conditional expressions
One option for it is to use conditional expressions. They provide simple way of checking conditions and providing one of values depending on them. In your case it will look like:
sorted_things = things.annotate(diff=Case(When(n1=F('n2'), then=True), default=False, output_field=BooleanField())).order_by('diff')
Q and ExpressionWrapper
There is another, a bit hacky way, to achieve that by combining usage of Q and ExpressionWrapper.
In django, Q is intended to be used inside filter(), exclude(), Case etc. but it simply creates condition that apparently can be used anywhere. It has only one drawback: it doesn't define what type is outputting (it's always boolean and django can assume that in every case when Q is intended to be used.
But there comes ExpressionWrapper that allows you to wrap any expression and define it's final output type. That way we can simply wrap Q expression (or more than one Q expresisons glued together using &, | and brackets) and define by hand what type it outputs.
Be aware that this is undocumented, so this behavior may change in future, but I've checked it using django versions 1.8, 1.11 and 2.0 and it works fine
Example:
sorted_things = things.annotate(diff=ExpressionWrapper(Q(n1=F('n2')), output_field=BooleanField())).order_by('diff')

You can work around it using Func() expressions.
from django.db.models import Func, F
class NotEqual(Func):
arg_joiner = '<>'
arity = 2
function = ''
things = Thing.objects.annotate(diff=NotEqual(F('n1'), F('n2'))).order_by('diff')

Related

Django, using "|": Expression tree is too large (maximum depth 1000)

I'm trying to concatenate many querysets together. I tried out the marked answer from this question a while back, but that didn't work in my case. I needed to return a queryset not a list. So I used the |, from the second answer. This worked fine at the time, but now that I'm trying to use it again for something else I get the following error:
Expression tree is too large (maximum depth 1000)
I originally thought that | would concat the querysets, but after reading the docs it appears that it concats the actual query. And that this specific problem occurs if the query becomes too long/complex.
This is what I'm trying to do:
def properties(self, request, pk=None):
project = self.get_object()
if project is None:
return Response({'detail': 'Missing project id'}, status=404)
functions = Function.objects.filter(project=project)
properties = Property.objects.none()
for function in functions:
properties = properties | function.property_set.all()
return Response([PropertySerializer(x).data for x in properties])
Since the functions query returns roughly 1200 results, and each function has about 5 properties, I can understand the query becoming too long/complex.
How can I prevent the query from becoming too complex? Or how can I execute multiple queries and concat them afterwards, while keeping the end result a queryset?

I think you want to obtain all the Property objects that have as Function a certain project.
We can query this with:
properties = Property.objects.filter(function__project=project)
This thus is a queryset that contains all property objects for which the function (I assume this is a ForeignKey) has as project (probably again a ForeignKey is the given project). This will result in a single query as well, but you will avoid constructing gigantic unions.
Alternatively, you can do it in two steps, but this would actually make it slower:
# probably less efficient
function_ids = (Function.objects.filter(project=project)
.values_list('pk', flat=True))
properties = Properties.object(function_id__in=function_ids)

Django - how to filter using QuerySet to get subset of objects?

According to documentation:
filter(**kwargs) Returns a new QuerySet containing objects that match
the given lookup parameters.
The lookup parameters (**kwargs) should be in the format described in
Field lookups below. Multiple parameters are joined via AND in the
underlying SQL statement.
Which to me suggests it will return a subset of items that were in original set.
However I seem to be missing something as below example does not behave as I would expect:
>>> kids = Kid.objects.all()
>>> tuple(k.name for k in kids)
(u'Bob',)
>>> toys = Toy.objects.all()
>>> tuple( (t.name, t.owner.name) for t in toys)
((u'car', u'Bob'), (u'bear', u'Bob'))
>>> subsel = Kid.objects.filter( owns__in = toys )
>>> tuple( k.name for k in subsel )
(u'Bob', u'Bob')
>>> str(subsel.query)
'SELECT "bug_kid"."id", "bug_kid"."name" FROM "bug_kid" INNER JOIN "bug_toy" ON ("bug_kid"."id" = "bug_toy"."owner_id") WHERE "bug_toy"."id" IN (SELECT U0."id" FROM "bug_toy" U0)'
As you can see in above subsel ends up returning duplicate records, which is not what I wanted. My question is what is the proper way to get subset? (note: set by definition will not have multiple occurrences of the same object)
Explanation as to why it behaves like that would be also nice, as to me filter means what you achieve with filter() built-in function in Python. Which is: take elements that fulfill requirement (or in other words discard ones that do not). And this definition doesn't seem to allow introduction/duplication of objects.
I know can aplly distinct() to the whole thing, but that still results in rather ugly (and probably slower than could be) query:
>>> str( subsel.distinct().query )
'SELECT DISTINCT "bug_kid"."id", "bug_kid"."name" FROM "bug_kid" INNER JOIN "bug_toy" ON ("bug_kid"."id" = "bug_toy"."owner_id") WHERE "bug_toy"."id" IN (SELECT U0."id" FROM "bug_toy" U0)'
My models.py for completeness:
from django.db import models
class Kid(models.Model):
name = models.CharField(max_length=200)
class Toy(models.Model):
name = models.CharField(max_length=200)
owner = models.ForeignKey(Kid, related_name='owns')
edit:
After a chat with #limelight the conclusion is that my problem is that I expect filter() to behave according to dictionary definition. And i.e. how it works in Python or any other sane framework/language.
More precisely if I have set A = {x,y,z} and I invoke A.filter( <predicate> ) I don't expect any elements to get duplicated. With Django's QuerySet however it behaves like this:
A = {x,y,z}
A.filter( <predicate> )
# now A i.e. = {x,x}
So first of all the issue is inappropriate method name (something like match() would be much better).
Second thing is that I think it is possible to create more efficient query than what Django allows me to. I might be wrong on that, if I will have a bit of time I will probably try to check if that is true.

This is kind of ugly, but works (without any type safety):
toy_owners = Toy.objects.values("owner_id") # optionally with .distinct()
Kid.objects.filter(id__in=toy_owners)
If performance is not an issue, I think #limelights is right.
PS! I tested your query on Django 1.6b2 and got the same unnecessary complex query.

Instead DISTINCT you can use GROUP BY (annotate in django) to get distinct kids.
toy_owners = Toy.objects.values_list("owner_id", flat=True).distinct()
Kid.objects.only('name').filter(pk__in=toy_owners).annotate(count=Count('owns'))

Idiomatic/fast Django ORM check for existence on mysql/postgres

If I want to check for the existence and if possible retrieve an object, which of the following methods is faster? More idiomatic? And why? If not either of the two examples I list, how else would one go about doing this?
if Object.objects.get(**kwargs).exists():
my_object = Object.objects.get(**kwargs)
my_object = Object.objects.filter(**kwargs)
if my_object:
my_object = my_object[0]
If relevant, I care about mysql and postgres for this.

Why not do this in a try/except block to avoid the multiple queries / query then an if?
try:
obj = Object.objects.get(**kwargs)
except Object.DoesNotExist:
pass
Just add your else logic under the except.

django provides a pretty good overview of exists
Using your first example it will do the query two times, according to the documentation:
if some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
So if you're going to be using the object, after checking for existance, the docs suggest just using it and forcing evaluation 1 time using
if my_object:
pass

A more pythonic way to build a class based on a string (how not to use eval)

OK.
So I've got a database where I want to store references to other Python objects (right now I'm using to store inventory information for person stores of beer recipe ingredients).
Since there are about 15-20 different categories of ingredients (all represented by individual SQLObjects) I don't want to do a bunch of RelatedJoin columns since, well, I'm lazy, and it seems like it's not the "best" or "pythonic" solution as it is.
So right now I'm doing this:
class Inventory(SQLObject):
inventory_item_id = IntCol(default=0)
amount = DecimalCol(size=6, precision=2, default=0)
amount_units = IntCol(default=Measure.GM)
purchased_on = DateCol(default=datetime.now())
purchased_from = UnicodeCol(default=None, length=256)
price = CurrencyCol(default=0)
notes = UnicodeCol(default=None)
inventory_type = UnicodeCol(default=None)
def _get_name(self):
return eval(self.inventory_type).get(self.inventory_item_id).name
def _set_inventory_item_id(self, value):
self.inventory_type = value.__class__.__name__
self._SO_set_inventory_item_id(value.id)
Please note the ICKY eval() in the _get_name() method.
How would I go about calling the SQLObject class referenced by the string I'm getting from __class__.__name__ without using eval()? Or is this an appropriate place to utilize eval()? (I'm sort of of the mindset where it's never appropriate to use eval() -- however since the system never uses any end user input in the eval() it seems "safe".)

To get the value of a global by name; Use:
globals()[self.inventory_type]

How to get the difference of two querysets in Django?

I have to querysets. alllists and subscriptionlists
alllists = List.objects.filter(datamode = 'A')
subscriptionlists = Membership.objects.filter(member__id=memberid, datamode='A')
I need a queryset called unsubscriptionlist, which possess all records in alllists except the records in subscription lists. How to achieve this?

Since Django 1.11, QuerySets have a difference() method amongst other new methods:
# Capture elements that are in qs_all but not in qs_part
qs_diff = qs_all.difference(qs_part)
Also see: https://stackoverflow.com/a/45651267/5497962

You should be able to use the set operation difference to help:
set(alllists).difference(set(subscriptionlists))

Well I see two options here.
1. Filter things manually (quite ugly)
diff = []
for all in alllists:
found = False
for sub in subscriptionlists:
if sub.id == all.id:
found = True
break
if not found:
diff.append(all)
2. Just make another query
diff = List.objects.filter(datamode = 'A').exclude(member__id=memberid, datamode='A')

How about:
subscriptionlists = Membership.objects.filter(member__id=memberid, datamode='A')
unsubscriptionlists = Membership.objects.exclude(member__id=memberid, datamode='A')
The unsubscriptionlists should be the inverse of subscription lists.
Brian's answer will work as well, though set() will most likely evaluate the query and will take a performance hit in evaluating both sets into memory. This method will keep the lazy initialization until you need the data.

In case anyone's searching for a way to do symmetric difference, such operator is not available in Django.
That said, it's not that hard to implement it using difference and union, and it'll all be done in a single query:
q1.difference(q2).union(q2.difference(q1))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django - Optimal way to sort models by boolean operation - python

You can work around it using Func() expressions. from django.db.models import Func, F class NotEqual(Func): arg_joiner = '<>' arity = 2 function = '' things = Thing.objects.annotate(diff=NotEqual(F('n1'), F('n2'))).order_by('diff')

Related

Django, using "|": Expression tree is too large (maximum depth 1000)

Django - how to filter using QuerySet to get subset of objects?

Idiomatic/fast Django ORM check for existence on mysql/postgres

A more pythonic way to build a class based on a string (how not to use eval)

How to get the difference of two querysets in Django?

Categories

Resources