How to chain Django querysets preserving individual order - python

I'd like to append or chain several Querysets in Django, preserving the order of each one (not the result). I'm using a third-party library to paginate the result, and it only accepts lists or querysets. I've tried these options:
Queryset join: Doesn't preserve ordering in individual querysets, so I can't use this.
result = queryset_1 | queryset_2
Using itertools: Calling list() on the chain object actually evaluates the querysets and this could cause a lot of overhead. Doesn't it?
result = list(itertools.chain(queryset_1, queryset_2))
How do you think I should go?

This solution prevents duplicates:
q1 = Q(...)
q2 = Q(...)
q3 = Q(...)
qs = (
Model.objects
.filter(q1 | q2 | q3)
.annotate(
search_type_ordering=Case(
When(q1, then=Value(2)),
When(q2, then=Value(1)),
When(q3, then=Value(0)),
default=Value(-1),
output_field=IntegerField(),
)
)
.order_by('-search_type_ordering', ...)
)

If the querysets are of different models, you have to evaluate them to lists and then you can just append:
result = list(queryset_1) + list(queryset_2)
If they are the same model, you should combine the queries using the Q object and 'order_by("queryset_1 field", "queryset_2 field")'.
The right answer largely depends on why you want to combine these and how you are going to use the results.

So, inspired by Peter's answer this is what I did in my project (Django 2.2):
from django.db import models
from .models import MyModel
# Add an extra field to each query with a constant value
queryset_0 = MyModel.objects.annotate(
qs_order=models.Value(0, models.IntegerField())
)
# Each constant should basically act as the position where we want the
# queryset to stay
queryset_1 = MyModel.objects.annotate(
qs_order=models.Value(1, models.IntegerField())
)
[...]
queryset_n = MyModel.objects.annotate(
qs_order=models.Value(n, models.IntegerField())
)
# Finally, I ordered the union result by that extra field.
union = queryset_0.union(
queryset_1,
queryset_2,
[...],
queryset_n).order_by('qs_order')
With this, I could order the resulting union as I wanted without changing any private attribute while only evaluating the querysets once.

I'm not 100% sure this solution works in every possible case, but it looks like the result is the union of two QuerySets (on the same model) preserving the order of the first one:
union = qset1.union(qset2)
union.query.extra_order_by = qset1.query.extra_order_by
union.query.order_by = qset1.query.order_by
union.query.default_ordering = qset1.query.default_ordering
union.query.get_meta().ordering = qset1.query.get_meta().ordering
I did not test it extensively, so before you use that code in production, make sure it behaves like expected.

If you need to merge two querysets into a third queryset, here is an example, using _result_cache.
model
class ImportMinAttend(models.Model):
country=models.CharField(max_length=2, blank=False, null=False)
status=models.CharField(max_length=5, blank=True, null=True, default=None)
From this model, I want to display a list of all the rows such that :
(query 1) empty status go first, ordered by countries
(query 2) non empty status go in second, ordered by countries
I want to merge query 1 and query 2.
#get all the objects
queryset=ImportMinAttend.objects.all()
#get the first queryset
queryset_1=queryset.filter(status=None).order_by("country")
#len or anything that hits the database
len(queryset_1)
#get the second queryset
queryset_2=queryset.exclude(status=None).order_by("country")
#append the second queryset to the first one AND PRESERVE ORDER
for query in queryset_2:
queryset_1._result_cache.append(query)
#final result
queryset=queryset_1
It might not be very efficient, but it works :).

For Django 1.11 (released on April 4, 2017) use union() for this, documentation here:
https://docs.djangoproject.com/en/1.11/ref/models/querysets/#django.db.models.query.QuerySet.union
Here is the Version 2.1 link to this:
https://docs.djangoproject.com/en/2.1/ref/models/querysets/#union

the union() function to combine multiple querysets together, rather than the or (|) operator. This avoids a very inefficient OUTER JOIN query that reads the entire table.

If two querysets has common field, you can order combined queryset by that field. Querysets are not evaluated during this operation.
For example:
class EventsHistory(models.Model):
id = models.IntegerField(primary_key=True)
event_time = models.DateTimeField()
event_id = models.IntegerField()
class EventsOperational(models.Model):
id = models.IntegerField(primary_key=True)
event_time = models.DateTimeField()
event_id = models.IntegerField()
qs1 = EventsHistory.objects.all()
qs2 = EventsOperational.objects.all()
qs_combined = qs2.union(qs1).order_by('event_time')

Related

Order by with specific rows first

I have a generic ListView in django 1.11 and I need to return the object ordered by alphabetical order, but changing the first 2 :
class LanguageListAPIView(generics.ListCreateAPIView):
queryset = Language.objects.all().order_by("name")
serializer_class = LanguageSerializer
with the following Language model :
class Language(models.Model):
name = models.CharField(max_length=50, unique=True)
And I'd like to return ENGLISH, FRENCH then every other languages in the database ordered by name.
Is there a way to achieve this with django ORM ?
Thank you,
Maybe you can use two querysets and combine them to obtain the result as:
q1 = Language.objects.filter(Q(name='ENGLISH'|name='FRENCH'))
and
q2 = Language.objects.filter(~Q(name='ENGLISH'|name='FRENCH')).order_by('name')
Then join the querysets as:
queryset = list(chain(q1, q2))
Import Q from django.db.models and chain from itertools
Since Django 1.8 you use Conditional Expressions:
from django.db.models import Case, When, Value, IntegerField
Language.objects.annotate(
order=Case(
When(name="ENGLISH", then=Value(1)),
When(name="FRENCH", then=Value(2)),
default=Value(3),
output_field=IntegerField(),
)
).order_by('order', 'name)
This will annotate a field called order, then sort the results first by the order field, then by the name field, where English/French will get a a lower order value, all following languages the same so that they are only sorted by name.

Django ORM - find objects that a variable fits inbetween a range of model fields?

I am trying to find all django objects using an integer variable, where this variable is inbetween two django model fields. I understand using __range() is normally for finding if a model field is inbetween two variables, but I need it the other way around.
models:
class Location(models.Model):
location_start = models.IntegerField()
location_end = models.IntegerField()
sample_id = models.ForeignKey(Sample,
on_delete=models.CASCADE, db_column='sample_id')
views ( doesnt work) :
location_query = 1276112
loc_obj = Location.objects.filter(
sample_id=sample_obj,
location_query__range(location_start, location_end)
)
Raw SQL:
SELECT *
FROM location
WHERE sample_id=12
AND 1276112 BETWEEN location_start AND location_end
Is there an easier way to do this without looping through the objects?
If I understand you correctly you want to filter all Location objects with obj.location_start < location_query < obj.location_end. The filter statement for that would look like this:
loc_obj = Location.objects.filter(
sample_id=sample_obj,
location_start__lt=location_query,
location_end__gt=location_query)
)
If you want an inclusive range (<=), use location_start__lte and location_end__gt=location_query.
How about this(using gte and lte):
loc_obj = Location.objects.filter(
sample_id=sample_obj,
location_start__gte=location_start,
location_end__lte=location_end
)

How to sort queryset by annotated attr from ManyToMany field

Simplest example:
class User(models.Model):
name = ...
class Group(models.Model):
members = models.ManyToManyField(User, through='GroupMembership')
class GroupMembership(models.Model):
user = ...
group = ...
I want to get list of Groups ordered by annotated field of members.
I'm using trigram search to filter and annotate User queryset.
To get annotated users I have something like that:
User.objects.annotate(...).annotate(similarity=...)
And now I'm trying to sort Groups queryset by Users' "similarity":
ann_users = User.objects.annotate(...).annotate(similarity=...)
qs = Group.objects.prefetch_related(Prefetch('members',
queryset=ann_users))
qs.annotate(similarity=Max('members__similarity')).order_by('similarity')
But it doesn't work, because prefetch_related does the ‘joining’ in Python; so I have the error:
"FieldError: Cannot resolve keyword 'members' into field."
I expect that you have a database function for similarity of names by trigram search and its Django binding or you create any:
from django.db.models import Max, Func, Value, Prefetch
class Similarity(Func):
function = 'SIMILARITY'
arity = 2
SEARCHED_NAME = 'searched_name'
ann_users = User.objects.annotate(similarity=Similarity('name', Value(SEARCHED_NAME)))
qs = Group.objects.prefetch_related(Prefetch('members', queryset=ann_users))
qs = qs.annotate(
similarity=Max(Similarity('members__name', Value(SEARCHED_NAME)))
).order_by('similarity')
The main query is compiled to
SELECT app_group.id, MAX(SIMILARITY(app_user.name, %s)) AS similarity
FROM app_group
LEFT OUTER JOIN app_groupmembership ON (app_group.id = app_groupmembership.group_id)
LEFT OUTER JOIN app_user ON (app_groupmembership.user_id = app_user.id)
GROUP BY app_group.id
ORDER BY similarity ASC;
-- params: ['searched_name']
It is not exactly what you want in the title, but the result is the same.
Notes: The efficiency how many times will be the SIMILARITY function evaluated depends on the database query optimizer. The query plan by EXPLAIN command will be an interesting your answer, if the original idea by raw query in some simplified case is better.

Efficiently counting leaves in a tree stored in a db

I have a Django project that has two models: Group and Person. Groups can contain either Person objects or other Group objects. Groups cannot form a cycle (i.e. Group A containing Group B containing Group A), which results in a tree structure where Person objects are leaves.
My question is - how can I count all the contained Group objects and Person objects within a high level Group (like the root Group) with as few SQL queries as possible?
A naive approach with O(N) (where N is # of subgroups) SQL queries would be:
def Group(models.Model):
name = models.CharField(max_length=150)
parent_group = models.ForeignKey('self', related_name=child_groups, null=True, blank=True)
# returns tuple (# of subgroups, # of person objects)
def count_objects(self):
count = (self.child_groups.count(), self.people.count())
for child_group in self.child_groups.all():
# this adds tuples together ( e.g: (1,2) and (1,2) make (2,4) )
tuple(map(operator.add, count, child_group.count_objects()))
def Person(models.Model):
user = models.ForeignKey(User)
picture = models.ImageSpecField(...)
group = models.ForeignKey('Group', related_name="people")
Is there a way to improve this or should I just store these values within the Group object?
So this is an existing problem that many others have tackled. If you're using Django, check this out:
http://django-mptt.github.com/django-mptt/index.html
Within Postgres you could use recursive queries, although there is no direct support for this in Django.
Alternatively you could consider denormalising the count, possibly there are libraries to do this. A quick google gave me: http://pypi.python.org/pypi/django-composition/
If you have to select the same values quite often and they don't change that much, you could try caching them.

Query Django model based on relationship between ForeignKey sub elements

class Price(models.Model):
date = models.DateField()
price = models.DecimalField(max_digits=6, decimal_places=2)
product = models.ForeignKey("Product")
class Product(models.Model):
name = models.CharField(max_length=256)
price_history = models.ManyToManyField(Price, related_name="product_price", blank=True)
I want to query Product such that I return only those products for whom the price on date x is higher than any earlier date.
Thanks boffins.
As Marcin said in another answer, you can drill down across relationships using the double underscore syntax. However, you can also chain them and sometimes this can be easier to understand logically, even though it leads to more lines of code. In your case though, I might do something that would look this:
first you want to know the price on date x:
a = Product.objects.filter(price_history__date = somedate_x)
you should probably test to see if there are more than one per date:
if a.count() == 1:
pass
else:
do something else here
(or something like that)
Now you have your price and you know your date, so just do this:
b = Product.objects.filter(price_history__date__lt = somedate, price_history__price__gt=a[0].price)
know that the slice will hit the database on its own and return an object. So this query will hit the database three times per function call, once for the count, once for the slice, and once for the actual query. You could forego the count and the slice by doing an aggregate function (like an average across all the returned rows in a day) but those can get expensive in their own right.
for more information, see the queryset api:
https://docs.djangoproject.com/en/dev/ref/models/querysets/
You can perform a query that spans relationships using this syntax:
Product.objects.filter(price_history__price = 3)
However, I'm not sure that it's possible to perform the query you want efficiently in a pure django query.

Categories

Resources