Does the order select_related is put in a queryset chain matter?
i.e. is there any difference between:
SomeModel.objects.select_related().all()
and
SomeModel.objects.all().select_related()
In my brief testing they both seem to cache objects but I'm wondering if there are any performance differences or anything else I'm not realizing is different?
They both execute the same exact query. So no, there would be no performance differences.
To test, try this:
q = SomeModel.objects.select_related().all()
print q.query
q = SomeModel.objects.all().select_related()
print q.query
You should get the same exact query
Related
Suppose I wanna run the exclude command repeatedly getting variables from exclude_list e.g. ['aa', 'ab' 'ac'].
I can do that using a loop:
for exclude_value in exclude_list:
myQueryset.exclude(variable__startswith=exclude_value)
However, I'd like to do that using the itertools.chain command as I've read it is capable of doing so. Any suggestions?
What you are doing is the correct approach - except for one small detail, you aren't retaining the excludes. Django querysets are lazily evaluated, so running through a loop and continually chaining won't do anything, right up until you try to access something from the set.
If you do this:
qs = MyModel.objects
for exclude_value in exclude_list:
qs = qs.exclude(variable__startswith=exclude_value)
qs = None
The database is never hit.
So do this:
qs = MyModel.objects
for exclude_value in exclude_list:
qs = qs.exclude(variable__startswith=exclude_value)
qs.count() # Or whatever you want the queryset for
and you should be fine, if/when you are experience database slowdown, which ill likely be because of the large number of freetext expressions in a query, then do some profiling, then you can find an efficiency.
But I'd wager the above code would be sufficient for your needs.
According to documentation:
filter(**kwargs) Returns a new QuerySet containing objects that match
the given lookup parameters.
The lookup parameters (**kwargs) should be in the format described in
Field lookups below. Multiple parameters are joined via AND in the
underlying SQL statement.
Which to me suggests it will return a subset of items that were in original set.
However I seem to be missing something as below example does not behave as I would expect:
>>> kids = Kid.objects.all()
>>> tuple(k.name for k in kids)
(u'Bob',)
>>> toys = Toy.objects.all()
>>> tuple( (t.name, t.owner.name) for t in toys)
((u'car', u'Bob'), (u'bear', u'Bob'))
>>> subsel = Kid.objects.filter( owns__in = toys )
>>> tuple( k.name for k in subsel )
(u'Bob', u'Bob')
>>> str(subsel.query)
'SELECT "bug_kid"."id", "bug_kid"."name" FROM "bug_kid" INNER JOIN "bug_toy" ON ("bug_kid"."id" = "bug_toy"."owner_id") WHERE "bug_toy"."id" IN (SELECT U0."id" FROM "bug_toy" U0)'
As you can see in above subsel ends up returning duplicate records, which is not what I wanted. My question is what is the proper way to get subset? (note: set by definition will not have multiple occurrences of the same object)
Explanation as to why it behaves like that would be also nice, as to me filter means what you achieve with filter() built-in function in Python. Which is: take elements that fulfill requirement (or in other words discard ones that do not). And this definition doesn't seem to allow introduction/duplication of objects.
I know can aplly distinct() to the whole thing, but that still results in rather ugly (and probably slower than could be) query:
>>> str( subsel.distinct().query )
'SELECT DISTINCT "bug_kid"."id", "bug_kid"."name" FROM "bug_kid" INNER JOIN "bug_toy" ON ("bug_kid"."id" = "bug_toy"."owner_id") WHERE "bug_toy"."id" IN (SELECT U0."id" FROM "bug_toy" U0)'
My models.py for completeness:
from django.db import models
class Kid(models.Model):
name = models.CharField(max_length=200)
class Toy(models.Model):
name = models.CharField(max_length=200)
owner = models.ForeignKey(Kid, related_name='owns')
edit:
After a chat with #limelight the conclusion is that my problem is that I expect filter() to behave according to dictionary definition. And i.e. how it works in Python or any other sane framework/language.
More precisely if I have set A = {x,y,z} and I invoke A.filter( <predicate> ) I don't expect any elements to get duplicated. With Django's QuerySet however it behaves like this:
A = {x,y,z}
A.filter( <predicate> )
# now A i.e. = {x,x}
So first of all the issue is inappropriate method name (something like match() would be much better).
Second thing is that I think it is possible to create more efficient query than what Django allows me to. I might be wrong on that, if I will have a bit of time I will probably try to check if that is true.
This is kind of ugly, but works (without any type safety):
toy_owners = Toy.objects.values("owner_id") # optionally with .distinct()
Kid.objects.filter(id__in=toy_owners)
If performance is not an issue, I think #limelights is right.
PS! I tested your query on Django 1.6b2 and got the same unnecessary complex query.
Instead DISTINCT you can use GROUP BY (annotate in django) to get distinct kids.
toy_owners = Toy.objects.values_list("owner_id", flat=True).distinct()
Kid.objects.only('name').filter(pk__in=toy_owners).annotate(count=Count('owns'))
I want to minimize the number of database queries my application makes, and I am familiarizing myself more with Django's ORM. I am wondering, what are the cases where a query is executed.
For instance, this format is along the lines of the answer I'm looking for (for example purposes, not accurate to my knowledge):
Model.objects.get()
Always launches a query
Model.objects.filter()
Launches a query if objects is empty only
(...)
I am assuming curried filter operations never make additional requests, but from the docs it looks like filter() does indeed make database requests if it's the first thing called.
If you're using test cases, you can use this custom assertion included in django's TestCase: assertNumQueries().
Example:
with self.assertNumQueries(2):
x = SomeModel.objects.get(pk=1)
y = x.some_foreign_key_in_object
If the expected number of queries was wrong, you'd see an assertion failed message of the form:
Num queries (expected - actual):
2 : 5
In this example, the foreign key would cause an additional query even though there's no explicit query (get, filter, exclude, etc.).
For this reason, I would use a practical approach: Test or logging, instead of trying to learn each of the cases in which django is supposed to query.
If you don't use unit tests, you may use this other method which prints the actual SQL statements sent by django, so you can have an idea of the complexity of the query, and not just the number of queries:
(DEBUG setting must be set to True)
from django.db import connection
x = SomeModel.objects.get(pk=1)
y = x.some_foreign_key_in_object
print connection.queries
The print would show a dictionary of queries:
[
{'sql': 'SELECT a, b, c, d ... FROM app_some_model', 'time': '0.002'},
{'sql': 'SELECT j, k, ... FROM app_referenced_model JOIN ... blabla ',
'time': '0.004'}
]
Docs on connection.queries.
Of course, you can also combine both methods and use the print connection.queries in your test cases.
See Django's documentation on when querysets are evaluated: https://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated
Evaluation in this case means that the query is executed. This mostly happens when you are trying to access the results, eg. when calling list() or len() on it or iterating over the results.
get()in your example doesn't return a queryset but a model objects, therefore it is evaluated immediately.
If I want to check for the existence and if possible retrieve an object, which of the following methods is faster? More idiomatic? And why? If not either of the two examples I list, how else would one go about doing this?
if Object.objects.get(**kwargs).exists():
my_object = Object.objects.get(**kwargs)
my_object = Object.objects.filter(**kwargs)
if my_object:
my_object = my_object[0]
If relevant, I care about mysql and postgres for this.
Why not do this in a try/except block to avoid the multiple queries / query then an if?
try:
obj = Object.objects.get(**kwargs)
except Object.DoesNotExist:
pass
Just add your else logic under the except.
django provides a pretty good overview of exists
Using your first example it will do the query two times, according to the documentation:
if some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
So if you're going to be using the object, after checking for existance, the docs suggest just using it and forcing evaluation 1 time using
if my_object:
pass
I have to querysets. alllists and subscriptionlists
alllists = List.objects.filter(datamode = 'A')
subscriptionlists = Membership.objects.filter(member__id=memberid, datamode='A')
I need a queryset called unsubscriptionlist, which possess all records in alllists except the records in subscription lists. How to achieve this?
Since Django 1.11, QuerySets have a difference() method amongst other new methods:
# Capture elements that are in qs_all but not in qs_part
qs_diff = qs_all.difference(qs_part)
Also see: https://stackoverflow.com/a/45651267/5497962
You should be able to use the set operation difference to help:
set(alllists).difference(set(subscriptionlists))
Well I see two options here.
1. Filter things manually (quite ugly)
diff = []
for all in alllists:
found = False
for sub in subscriptionlists:
if sub.id == all.id:
found = True
break
if not found:
diff.append(all)
2. Just make another query
diff = List.objects.filter(datamode = 'A').exclude(member__id=memberid, datamode='A')
How about:
subscriptionlists = Membership.objects.filter(member__id=memberid, datamode='A')
unsubscriptionlists = Membership.objects.exclude(member__id=memberid, datamode='A')
The unsubscriptionlists should be the inverse of subscription lists.
Brian's answer will work as well, though set() will most likely evaluate the query and will take a performance hit in evaluating both sets into memory. This method will keep the lazy initialization until you need the data.
In case anyone's searching for a way to do symmetric difference, such operator is not available in Django.
That said, it's not that hard to implement it using difference and union, and it'll all be done in a single query:
q1.difference(q2).union(q2.difference(q1))