class Price(models.Model):
date = models.DateField()
price = models.DecimalField(max_digits=6, decimal_places=2)
product = models.ForeignKey("Product")
class Product(models.Model):
name = models.CharField(max_length=256)
price_history = models.ManyToManyField(Price, related_name="product_price", blank=True)
I want to query Product such that I return only those products for whom the price on date x is higher than any earlier date.
Thanks boffins.
As Marcin said in another answer, you can drill down across relationships using the double underscore syntax. However, you can also chain them and sometimes this can be easier to understand logically, even though it leads to more lines of code. In your case though, I might do something that would look this:
first you want to know the price on date x:
a = Product.objects.filter(price_history__date = somedate_x)
you should probably test to see if there are more than one per date:
if a.count() == 1:
pass
else:
do something else here
(or something like that)
Now you have your price and you know your date, so just do this:
b = Product.objects.filter(price_history__date__lt = somedate, price_history__price__gt=a[0].price)
know that the slice will hit the database on its own and return an object. So this query will hit the database three times per function call, once for the count, once for the slice, and once for the actual query. You could forego the count and the slice by doing an aggregate function (like an average across all the returned rows in a day) but those can get expensive in their own right.
for more information, see the queryset api:
https://docs.djangoproject.com/en/dev/ref/models/querysets/
You can perform a query that spans relationships using this syntax:
Product.objects.filter(price_history__price = 3)
However, I'm not sure that it's possible to perform the query you want efficiently in a pure django query.
Related
I have Model "A" that both relates to another model and acts as a public face to the actual data (Model "B"), users can modify the contents of A but not of B.
For every B there can be many As, and they have a one to many relation.
When I display this model anytime there's two or more A's related to the B I see "duplicate" records with (almost always) the same data, a bad experience.
I want to return a queryset of A items that relate to the B items, and when there's more than one roll them up to the first entered item.
I also want to count the related model B items and return that count to give me an indication of how much duplication is available.
I wrote the following analogous SQL query which counts the related items and uses first_value to find the first A created partitioned by B.
SELECT *
FROM
(
SELECT
COUNT(*) OVER (PARTITION BY b_id) as count_related_items,
FIRST_VALUE(id) OVER (PARTITION BY b_id order by created_time ASC) as first_filter,
*
FROM A
) AS A1
WHERE
A1.first_filter = A1.id;
As requested, here's a simplified view of the models:
class CoreData(models.Model):
title = models.CharField(max_length=500)
class UserData(models.Model):
core = models.ForeignKey("CoreData", on_delete=models.CASCADE)
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
title = models.CharField(max_length=500)
When a user creates data it first checks/creates the CoreData, storing things like the title, and then it creates the UserData, with a reference to the CoreData.
When a second user creates a piece of data and it references the same CoreData is when the "duplication" is introduced and why you can roll up the UserData (in SQL) to find the count and the "first" entry in the one to many relation.
Assuming my understanding is correct -
If you are querying from the UserData model the query would look something like this:
Considering CoreData.id = 18
user_data = UserData.objects.filter(core__id=18).
order_by("created_time").annotate(duplicate_count=Count('core__userData', filter(core__id=18))).first()
user_data would be the First object created which is related to the CoreData object. Also,
user_data.duplicate_count will give you the Count of UserData objects that are related to the CoreData object.
Reference Docs on Annotate here
Update:
If you need the list of UserData of specific CoreData you could use
user_data = UserData.objects.filter(core__id=18).
order_by("created_time").annotate(duplicate_count=Count('core__UserData', filter(core__id=18)))
Simplifying my model a lot, I have the following:
class Player(models.Model):
name = models.CharField(max_length=50)
number = models.IntegerField()
class Statistic(models.Model):
'''
Known codes are:
- goals
- assists
- red_cards
'''
# Implicit ID
player = models.ForeignKey(
'Player', on_delete=models.CASCADE, related_name='statistics')
code = models.CharField(max_length=50)
value = models.CharField(max_length=50, null=True)
I'm using a code-value strategy to add different statistics in the future, without the need of adding new fields to the model.
Now, I want to filter the players based on some statistics, for example, players who scored between 10 and 15 goals.
I'm trying something like this:
.filter('statistics__code'='goals').filter('statistics__value__range'=[10,15])
but I'm getting duplicated players, I'm guessing because that value__range could refer to any Statistic.
How could I properly filter the queryset or avoid those duplicates?
And how could I filter by more than one statistic, for example, players who scored between 10 and 15 goals and have more than 5 assists?
By the way, note that the value field (in Statistic) is a string, and it will need to be treated as an integer in some scenarios (when using __range, for example).
You don't need to chain the filter. Use the filter() method only once with distinct() method.
.filter(statistics__code='goals', statistics__value__range=[10,15]).distinct()
NOTE: I can see few quotes around statistics__code and statistics__value__range, no need to put that.
I have a model that tracks the number of impressions for ads.
class Impression(models.Model):
ad = models.ForeignKey(Ad, on_delete=models.CASCADE)
user_ip = models.CharField(max_length=50, null=True, blank=True)
clicked = models.BooleanField(default=False)
time_created = models.DateTimeField(auto_now_add=True)
I want to find all the user_ip that has more than 1000 impressions. In other words, if a user_ip comes up in more than 1000 instances of Impression. How can I do that? I wrote a function for this but it is very inefficient and slow because it loops over every impression.
def check_ip():
for i in Impression.objects.all():
if Impression.objects.filter(user_ip=i.user_ip).count() > 1000:
print(i.user_ip)
You should be able to do this in one query with aggregation.. it is possible to filter on aggregate values (like Count()) as follows:
from django.db.models import Count
for ip in Impression.objects.values('user_ip').annotate(ipcount=Count('user_ip')).filter(ipcount__gt=1000):
# do something
Django querysets have an annotate() method which supports what you're trying to do.
from django.db.models import Count
Impression.objects.values('user_ip')\
.annotate(ip_count=Count('user_ip'))\
.filter(ip_count__gt=1000)
This will give you a queryset which returns dictionaries with 'user_ip' and 'ip_count' keys when used as an iterable.
To understand what's happening here you should look at Django's aggregation guide: https://docs.djangoproject.com/en/1.11/topics/db/aggregation/ (in particular this section which explains how annotate interacts with values)
The SQL generated is something like:
SELECT "impression"."user_ip", COUNT("impression"."user_ip") AS "ip_count"
FROM "impression"
GROUP BY "impression"."ip"
HAVING COUNT("impression"."ip") > 1000;
I'd like to append or chain several Querysets in Django, preserving the order of each one (not the result). I'm using a third-party library to paginate the result, and it only accepts lists or querysets. I've tried these options:
Queryset join: Doesn't preserve ordering in individual querysets, so I can't use this.
result = queryset_1 | queryset_2
Using itertools: Calling list() on the chain object actually evaluates the querysets and this could cause a lot of overhead. Doesn't it?
result = list(itertools.chain(queryset_1, queryset_2))
How do you think I should go?
This solution prevents duplicates:
q1 = Q(...)
q2 = Q(...)
q3 = Q(...)
qs = (
Model.objects
.filter(q1 | q2 | q3)
.annotate(
search_type_ordering=Case(
When(q1, then=Value(2)),
When(q2, then=Value(1)),
When(q3, then=Value(0)),
default=Value(-1),
output_field=IntegerField(),
)
)
.order_by('-search_type_ordering', ...)
)
If the querysets are of different models, you have to evaluate them to lists and then you can just append:
result = list(queryset_1) + list(queryset_2)
If they are the same model, you should combine the queries using the Q object and 'order_by("queryset_1 field", "queryset_2 field")'.
The right answer largely depends on why you want to combine these and how you are going to use the results.
So, inspired by Peter's answer this is what I did in my project (Django 2.2):
from django.db import models
from .models import MyModel
# Add an extra field to each query with a constant value
queryset_0 = MyModel.objects.annotate(
qs_order=models.Value(0, models.IntegerField())
)
# Each constant should basically act as the position where we want the
# queryset to stay
queryset_1 = MyModel.objects.annotate(
qs_order=models.Value(1, models.IntegerField())
)
[...]
queryset_n = MyModel.objects.annotate(
qs_order=models.Value(n, models.IntegerField())
)
# Finally, I ordered the union result by that extra field.
union = queryset_0.union(
queryset_1,
queryset_2,
[...],
queryset_n).order_by('qs_order')
With this, I could order the resulting union as I wanted without changing any private attribute while only evaluating the querysets once.
I'm not 100% sure this solution works in every possible case, but it looks like the result is the union of two QuerySets (on the same model) preserving the order of the first one:
union = qset1.union(qset2)
union.query.extra_order_by = qset1.query.extra_order_by
union.query.order_by = qset1.query.order_by
union.query.default_ordering = qset1.query.default_ordering
union.query.get_meta().ordering = qset1.query.get_meta().ordering
I did not test it extensively, so before you use that code in production, make sure it behaves like expected.
If you need to merge two querysets into a third queryset, here is an example, using _result_cache.
model
class ImportMinAttend(models.Model):
country=models.CharField(max_length=2, blank=False, null=False)
status=models.CharField(max_length=5, blank=True, null=True, default=None)
From this model, I want to display a list of all the rows such that :
(query 1) empty status go first, ordered by countries
(query 2) non empty status go in second, ordered by countries
I want to merge query 1 and query 2.
#get all the objects
queryset=ImportMinAttend.objects.all()
#get the first queryset
queryset_1=queryset.filter(status=None).order_by("country")
#len or anything that hits the database
len(queryset_1)
#get the second queryset
queryset_2=queryset.exclude(status=None).order_by("country")
#append the second queryset to the first one AND PRESERVE ORDER
for query in queryset_2:
queryset_1._result_cache.append(query)
#final result
queryset=queryset_1
It might not be very efficient, but it works :).
For Django 1.11 (released on April 4, 2017) use union() for this, documentation here:
https://docs.djangoproject.com/en/1.11/ref/models/querysets/#django.db.models.query.QuerySet.union
Here is the Version 2.1 link to this:
https://docs.djangoproject.com/en/2.1/ref/models/querysets/#union
the union() function to combine multiple querysets together, rather than the or (|) operator. This avoids a very inefficient OUTER JOIN query that reads the entire table.
If two querysets has common field, you can order combined queryset by that field. Querysets are not evaluated during this operation.
For example:
class EventsHistory(models.Model):
id = models.IntegerField(primary_key=True)
event_time = models.DateTimeField()
event_id = models.IntegerField()
class EventsOperational(models.Model):
id = models.IntegerField(primary_key=True)
event_time = models.DateTimeField()
event_id = models.IntegerField()
qs1 = EventsHistory.objects.all()
qs2 = EventsOperational.objects.all()
qs_combined = qs2.union(qs1).order_by('event_time')
I don't have much experience with Django (I'm using 1.3) so I have the feeling on the back of my head that this is a dumb question... But anyway:
I have models like this:
class User(models.Model):
name = models.CharField()
class Product(models.Model):
name = models.CharField()
public = models.BooleanField()
class Order(models.Model):
user = models.ForeignKey(User)
product = models.ManyToManyField(Product, through='OrderProduct')
class OrderProduct(models.Model):
product = models.ForeignKey(Product)
order = models.ForeignKey(Order)
expiration = models.DateField()
And let's say I do some query like this
Product.objects.filter(order__status='completed', order__user____id=2)
So I'd get all the products that User2 bought (let's say it's just Product1). Cool. But now I want the expiration for that product, but if I call Product1.orderproduct_set.all() I'm gonna get every entry of OrderProduct with Product1, but I just want the one returned from my queryset.
I know I can just run a different query on OrderProducts, but that would be another hit on the database just to bring back data the query I ran before can already get. .query on it gives me:
SELECT "shop_product"."id", "shop_product"."name"
FROM "shop_product"
INNER JOIN "shop_orderproducts" ON ("shop_product"."id" = "shop_orderproducts"."product_id")
INNER JOIN "shop_order" ON ("shop_orderproducts"."order_id" = "shop_order"."id")
WHERE ("shop_order"."user_id" = 2 AND "shop_order"."status" = completed )
ORDER BY "shop_product"."ordering" ASC
If I could SELECT * instead of specific fields I'd have all the data that I need in one query. Is there anyway to build that query and get only the data related to it?
EDIT
I feel I need to clarify some points, I'm sorry I haven't been clearer:
I'm not querying against OrderProduct because some products are public and don't have to be bought but I still have to list them, and they'd not be returned by a query against OrderProduct
The result I'm expecting is a list of products, along with their Order data (in case they have it). In JSON, it'd look somewhat like this
[{id: 1, order: 1, expiration: 2013-03-03, public: false},
{id: 1, order: , expiration: , public: true
Thanks
I'm gonna get every entry of OrderProduct with Product1, but I just
want the one returned from my queryset.
You just want which "one"? Your query is filtering on the Product model, so all Users, Orders, and OrderProducts associated with each of the Products in the returned queryset will be accessible.
If you want one specific OrderProduct, then you should be filtering as op = OrderProduct.objects.filter(xxxxx) and then accessing the models up the chain like so:
op.product, op.order, etc.
I would have suggested the method prefetch_related, but this isn't available in Django 1.3.
Dan Hoerst is right about selecting from OrderProduct, but that still hits the database more than necessary. We can stop that by using the select_related method.
>>> from django.db import connection
>>> len(connection.queries)
0
>>> first_result = OrderProduct.objects.select_related("order__user", "product")
... .filter( order__status="completed",
... order__user__pk=2 )[0]
>>> len(connection.queries)
1
>>> name = first_result.order.user.name
>>> len(connection.queries)
1
>>> product_name = first_result.product.name
>>> len(connection.queries)
1