Suppose I wanna run the exclude command repeatedly getting variables from exclude_list e.g. ['aa', 'ab' 'ac'].
I can do that using a loop:
for exclude_value in exclude_list:
myQueryset.exclude(variable__startswith=exclude_value)
However, I'd like to do that using the itertools.chain command as I've read it is capable of doing so. Any suggestions?
What you are doing is the correct approach - except for one small detail, you aren't retaining the excludes. Django querysets are lazily evaluated, so running through a loop and continually chaining won't do anything, right up until you try to access something from the set.
If you do this:
qs = MyModel.objects
for exclude_value in exclude_list:
qs = qs.exclude(variable__startswith=exclude_value)
qs = None
The database is never hit.
So do this:
qs = MyModel.objects
for exclude_value in exclude_list:
qs = qs.exclude(variable__startswith=exclude_value)
qs.count() # Or whatever you want the queryset for
and you should be fine, if/when you are experience database slowdown, which ill likely be because of the large number of freetext expressions in a query, then do some profiling, then you can find an efficiency.
But I'd wager the above code would be sufficient for your needs.
Related
I'm trying to concatenate many querysets together. I tried out the marked answer from this question a while back, but that didn't work in my case. I needed to return a queryset not a list. So I used the |, from the second answer. This worked fine at the time, but now that I'm trying to use it again for something else I get the following error:
Expression tree is too large (maximum depth 1000)
I originally thought that | would concat the querysets, but after reading the docs it appears that it concats the actual query. And that this specific problem occurs if the query becomes too long/complex.
This is what I'm trying to do:
def properties(self, request, pk=None):
project = self.get_object()
if project is None:
return Response({'detail': 'Missing project id'}, status=404)
functions = Function.objects.filter(project=project)
properties = Property.objects.none()
for function in functions:
properties = properties | function.property_set.all()
return Response([PropertySerializer(x).data for x in properties])
Since the functions query returns roughly 1200 results, and each function has about 5 properties, I can understand the query becoming too long/complex.
How can I prevent the query from becoming too complex? Or how can I execute multiple queries and concat them afterwards, while keeping the end result a queryset?
I think you want to obtain all the Property objects that have as Function a certain project.
We can query this with:
properties = Property.objects.filter(function__project=project)
This thus is a queryset that contains all property objects for which the function (I assume this is a ForeignKey) has as project (probably again a ForeignKey is the given project). This will result in a single query as well, but you will avoid constructing gigantic unions.
Alternatively, you can do it in two steps, but this would actually make it slower:
# probably less efficient
function_ids = (Function.objects.filter(project=project)
.values_list('pk', flat=True))
properties = Properties.object(function_id__in=function_ids)
Does the order select_related is put in a queryset chain matter?
i.e. is there any difference between:
SomeModel.objects.select_related().all()
and
SomeModel.objects.all().select_related()
In my brief testing they both seem to cache objects but I'm wondering if there are any performance differences or anything else I'm not realizing is different?
They both execute the same exact query. So no, there would be no performance differences.
To test, try this:
q = SomeModel.objects.select_related().all()
print q.query
q = SomeModel.objects.all().select_related()
print q.query
You should get the same exact query
I have to querysets. alllists and subscriptionlists
alllists = List.objects.filter(datamode = 'A')
subscriptionlists = Membership.objects.filter(member__id=memberid, datamode='A')
I need a queryset called unsubscriptionlist, which possess all records in alllists except the records in subscription lists. How to achieve this?
Since Django 1.11, QuerySets have a difference() method amongst other new methods:
# Capture elements that are in qs_all but not in qs_part
qs_diff = qs_all.difference(qs_part)
Also see: https://stackoverflow.com/a/45651267/5497962
You should be able to use the set operation difference to help:
set(alllists).difference(set(subscriptionlists))
Well I see two options here.
1. Filter things manually (quite ugly)
diff = []
for all in alllists:
found = False
for sub in subscriptionlists:
if sub.id == all.id:
found = True
break
if not found:
diff.append(all)
2. Just make another query
diff = List.objects.filter(datamode = 'A').exclude(member__id=memberid, datamode='A')
How about:
subscriptionlists = Membership.objects.filter(member__id=memberid, datamode='A')
unsubscriptionlists = Membership.objects.exclude(member__id=memberid, datamode='A')
The unsubscriptionlists should be the inverse of subscription lists.
Brian's answer will work as well, though set() will most likely evaluate the query and will take a performance hit in evaluating both sets into memory. This method will keep the lazy initialization until you need the data.
In case anyone's searching for a way to do symmetric difference, such operator is not available in Django.
That said, it's not that hard to implement it using difference and union, and it'll all be done in a single query:
q1.difference(q2).union(q2.difference(q1))
Imagine you have the following situation:
for i in xrange(100000):
account = Account()
account.foo = i
account.save
Obviously, the 100,000 INSERT statements executed by Django are going to take some time. It would be nicer to be able to combine all those INSERTs into one big INSERT. Here's the kind of thing I'm hoping I can do:
inserts = []
for i in xrange(100000):
account = Account()
account.foo = i
inserts.append(account.insert_sql)
sql = 'INSERT INTO whatever... ' + ', '.join(inserts)
Is there a way to do this using QuerySet, without manually generating all those INSERT statements?
As shown in this related question, one can use #transaction.commit_manually to combine all the .save() operations as a single commit to greatly improve performance.
#transaction.commit_manually
def your_view(request):
try:
for i in xrange(100000):
account = Account()
account.foo = i
account.save()
except:
transaction.rollback()
else:
transaction.commit()
Alternatively, if you're feeling adventurous, have a look at this snippet which implements a manager for bulk inserting. Note that it works only with MySQL, and hasn't been updated in a while so it's hard to tell if it will play nice with newer versions of Django.
You could use raw SQL.
Either by Account.objects.raw() or using a django.db.connection objects.
This might not be an option if you want to maintain database agnosticism.
http://docs.djangoproject.com/en/dev/topics/db/sql/
If what you're doing is a one time setup, perhaps using a fixture would be better.
In my AppEngine project I have a need to use a certain filter as a base then apply various different extra filters to the end, retrieving the different result sets separately. e.g.:
base_query = MyModel.all().filter('mainfilter', 123)
Then I need to use the results of various sub queries separately:
subquery1 = basequery.filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = basequery.filter('subfilter2', 'abc')
#Do something with subquery2 results here
Unfortunately 'filter()' affects the state of the basequery Query instance, rather than just returning a modified version. Is there any way to duplicate the Query object and use it as a base? Is there perhaps a standard Python way of duping an object that could be used?
The extra filters are actually applied by the results of different forms dynamically within a wizard, and they use the 'running total' of the query in their branch to assess whether to ask further questions.
Obviously I could pass around a rudimentary stack of filter criteria, but I'd rather use the Query itself if possible, as it adds simplicity and elegance to the solution.
There's no officially approved (Eg, not likely to break) way to do this. Simply creating the query afresh from the parameters when you need it is your best option.
As Nick has said, you better create the query again, but you can still avoid repeating yourself. A good way to do that would be like this:
#inside a request handler
def create_base_query():
return MyModel.all().filter('mainfilter', 123)
subquery1 = create_base_query().filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = create_base_query().filter('subfilter2', 'abc')
#Do something with subquery2 results here