I have to querysets. alllists and subscriptionlists
alllists = List.objects.filter(datamode = 'A')
subscriptionlists = Membership.objects.filter(member__id=memberid, datamode='A')
I need a queryset called unsubscriptionlist, which possess all records in alllists except the records in subscription lists. How to achieve this?
Since Django 1.11, QuerySets have a difference() method amongst other new methods:
# Capture elements that are in qs_all but not in qs_part
qs_diff = qs_all.difference(qs_part)
Also see: https://stackoverflow.com/a/45651267/5497962
You should be able to use the set operation difference to help:
set(alllists).difference(set(subscriptionlists))
Well I see two options here.
1. Filter things manually (quite ugly)
diff = []
for all in alllists:
found = False
for sub in subscriptionlists:
if sub.id == all.id:
found = True
break
if not found:
diff.append(all)
2. Just make another query
diff = List.objects.filter(datamode = 'A').exclude(member__id=memberid, datamode='A')
How about:
subscriptionlists = Membership.objects.filter(member__id=memberid, datamode='A')
unsubscriptionlists = Membership.objects.exclude(member__id=memberid, datamode='A')
The unsubscriptionlists should be the inverse of subscription lists.
Brian's answer will work as well, though set() will most likely evaluate the query and will take a performance hit in evaluating both sets into memory. This method will keep the lazy initialization until you need the data.
In case anyone's searching for a way to do symmetric difference, such operator is not available in Django.
That said, it's not that hard to implement it using difference and union, and it'll all be done in a single query:
q1.difference(q2).union(q2.difference(q1))
Related
Let's imagine I have this model and I would like to sort them by logical operation n1 != n2:
class Thing(Model):
n1 = IntegerField()
n2 = IntegerField()
...
def is_different(self):
return self.n1 != self.n2
If I sort them by sorted built-in function, I found that it does not return a Queryset, but a list:
things = Thing.objects.all()
sorted_things = sorted(things, key=lambda x: x.is_different())
Now, if I use annotate
sorted_things = things.annotate(diff=(F('n1') != F('n2'))).order_by('diff')
it raises the following error: AttributeError: 'bool' object has no attribute 'resolve_expression'.
I found a solution using extra queryset:
sorted_things = things.extra(select={'diff': 'n1!=n2'}).order_by('diff')
but following Django docs (https://docs.djangoproject.com/en/2.0/ref/models/querysets/#extra):
Use this method as a last resort
This is an old API that we aim to deprecate at some point in the future. Use it only if you cannot express your query using other queryset methods. If you do need to use it, please file a ticket using the QuerySet.extra keyword with your use case (please check the list of existing tickets first) so that we can enhance the QuerySet API to allow removing extra(). We are no longer improving or fixing bugs for this method.
Then, what is the optimal way to do it?
Thanks!
Conditional expressions
One option for it is to use conditional expressions. They provide simple way of checking conditions and providing one of values depending on them. In your case it will look like:
sorted_things = things.annotate(diff=Case(When(n1=F('n2'), then=True), default=False, output_field=BooleanField())).order_by('diff')
Q and ExpressionWrapper
There is another, a bit hacky way, to achieve that by combining usage of Q and ExpressionWrapper.
In django, Q is intended to be used inside filter(), exclude(), Case etc. but it simply creates condition that apparently can be used anywhere. It has only one drawback: it doesn't define what type is outputting (it's always boolean and django can assume that in every case when Q is intended to be used.
But there comes ExpressionWrapper that allows you to wrap any expression and define it's final output type. That way we can simply wrap Q expression (or more than one Q expresisons glued together using &, | and brackets) and define by hand what type it outputs.
Be aware that this is undocumented, so this behavior may change in future, but I've checked it using django versions 1.8, 1.11 and 2.0 and it works fine
Example:
sorted_things = things.annotate(diff=ExpressionWrapper(Q(n1=F('n2')), output_field=BooleanField())).order_by('diff')
You can work around it using Func() expressions.
from django.db.models import Func, F
class NotEqual(Func):
arg_joiner = '<>'
arity = 2
function = ''
things = Thing.objects.annotate(diff=NotEqual(F('n1'), F('n2'))).order_by('diff')
According to documentation:
filter(**kwargs) Returns a new QuerySet containing objects that match
the given lookup parameters.
The lookup parameters (**kwargs) should be in the format described in
Field lookups below. Multiple parameters are joined via AND in the
underlying SQL statement.
Which to me suggests it will return a subset of items that were in original set.
However I seem to be missing something as below example does not behave as I would expect:
>>> kids = Kid.objects.all()
>>> tuple(k.name for k in kids)
(u'Bob',)
>>> toys = Toy.objects.all()
>>> tuple( (t.name, t.owner.name) for t in toys)
((u'car', u'Bob'), (u'bear', u'Bob'))
>>> subsel = Kid.objects.filter( owns__in = toys )
>>> tuple( k.name for k in subsel )
(u'Bob', u'Bob')
>>> str(subsel.query)
'SELECT "bug_kid"."id", "bug_kid"."name" FROM "bug_kid" INNER JOIN "bug_toy" ON ("bug_kid"."id" = "bug_toy"."owner_id") WHERE "bug_toy"."id" IN (SELECT U0."id" FROM "bug_toy" U0)'
As you can see in above subsel ends up returning duplicate records, which is not what I wanted. My question is what is the proper way to get subset? (note: set by definition will not have multiple occurrences of the same object)
Explanation as to why it behaves like that would be also nice, as to me filter means what you achieve with filter() built-in function in Python. Which is: take elements that fulfill requirement (or in other words discard ones that do not). And this definition doesn't seem to allow introduction/duplication of objects.
I know can aplly distinct() to the whole thing, but that still results in rather ugly (and probably slower than could be) query:
>>> str( subsel.distinct().query )
'SELECT DISTINCT "bug_kid"."id", "bug_kid"."name" FROM "bug_kid" INNER JOIN "bug_toy" ON ("bug_kid"."id" = "bug_toy"."owner_id") WHERE "bug_toy"."id" IN (SELECT U0."id" FROM "bug_toy" U0)'
My models.py for completeness:
from django.db import models
class Kid(models.Model):
name = models.CharField(max_length=200)
class Toy(models.Model):
name = models.CharField(max_length=200)
owner = models.ForeignKey(Kid, related_name='owns')
edit:
After a chat with #limelight the conclusion is that my problem is that I expect filter() to behave according to dictionary definition. And i.e. how it works in Python or any other sane framework/language.
More precisely if I have set A = {x,y,z} and I invoke A.filter( <predicate> ) I don't expect any elements to get duplicated. With Django's QuerySet however it behaves like this:
A = {x,y,z}
A.filter( <predicate> )
# now A i.e. = {x,x}
So first of all the issue is inappropriate method name (something like match() would be much better).
Second thing is that I think it is possible to create more efficient query than what Django allows me to. I might be wrong on that, if I will have a bit of time I will probably try to check if that is true.
This is kind of ugly, but works (without any type safety):
toy_owners = Toy.objects.values("owner_id") # optionally with .distinct()
Kid.objects.filter(id__in=toy_owners)
If performance is not an issue, I think #limelights is right.
PS! I tested your query on Django 1.6b2 and got the same unnecessary complex query.
Instead DISTINCT you can use GROUP BY (annotate in django) to get distinct kids.
toy_owners = Toy.objects.values_list("owner_id", flat=True).distinct()
Kid.objects.only('name').filter(pk__in=toy_owners).annotate(count=Count('owns'))
If I want to check for the existence and if possible retrieve an object, which of the following methods is faster? More idiomatic? And why? If not either of the two examples I list, how else would one go about doing this?
if Object.objects.get(**kwargs).exists():
my_object = Object.objects.get(**kwargs)
my_object = Object.objects.filter(**kwargs)
if my_object:
my_object = my_object[0]
If relevant, I care about mysql and postgres for this.
Why not do this in a try/except block to avoid the multiple queries / query then an if?
try:
obj = Object.objects.get(**kwargs)
except Object.DoesNotExist:
pass
Just add your else logic under the except.
django provides a pretty good overview of exists
Using your first example it will do the query two times, according to the documentation:
if some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
So if you're going to be using the object, after checking for existance, the docs suggest just using it and forcing evaluation 1 time using
if my_object:
pass
Does the order select_related is put in a queryset chain matter?
i.e. is there any difference between:
SomeModel.objects.select_related().all()
and
SomeModel.objects.all().select_related()
In my brief testing they both seem to cache objects but I'm wondering if there are any performance differences or anything else I'm not realizing is different?
They both execute the same exact query. So no, there would be no performance differences.
To test, try this:
q = SomeModel.objects.select_related().all()
print q.query
q = SomeModel.objects.all().select_related()
print q.query
You should get the same exact query
In my AppEngine project I have a need to use a certain filter as a base then apply various different extra filters to the end, retrieving the different result sets separately. e.g.:
base_query = MyModel.all().filter('mainfilter', 123)
Then I need to use the results of various sub queries separately:
subquery1 = basequery.filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = basequery.filter('subfilter2', 'abc')
#Do something with subquery2 results here
Unfortunately 'filter()' affects the state of the basequery Query instance, rather than just returning a modified version. Is there any way to duplicate the Query object and use it as a base? Is there perhaps a standard Python way of duping an object that could be used?
The extra filters are actually applied by the results of different forms dynamically within a wizard, and they use the 'running total' of the query in their branch to assess whether to ask further questions.
Obviously I could pass around a rudimentary stack of filter criteria, but I'd rather use the Query itself if possible, as it adds simplicity and elegance to the solution.
There's no officially approved (Eg, not likely to break) way to do this. Simply creating the query afresh from the parameters when you need it is your best option.
As Nick has said, you better create the query again, but you can still avoid repeating yourself. A good way to do that would be like this:
#inside a request handler
def create_base_query():
return MyModel.all().filter('mainfilter', 123)
subquery1 = create_base_query().filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = create_base_query().filter('subfilter2', 'abc')
#Do something with subquery2 results here