I don't have much experience with Django (I'm using 1.3) so I have the feeling on the back of my head that this is a dumb question... But anyway:
I have models like this:
class User(models.Model):
name = models.CharField()
class Product(models.Model):
name = models.CharField()
public = models.BooleanField()
class Order(models.Model):
user = models.ForeignKey(User)
product = models.ManyToManyField(Product, through='OrderProduct')
class OrderProduct(models.Model):
product = models.ForeignKey(Product)
order = models.ForeignKey(Order)
expiration = models.DateField()
And let's say I do some query like this
Product.objects.filter(order__status='completed', order__user____id=2)
So I'd get all the products that User2 bought (let's say it's just Product1). Cool. But now I want the expiration for that product, but if I call Product1.orderproduct_set.all() I'm gonna get every entry of OrderProduct with Product1, but I just want the one returned from my queryset.
I know I can just run a different query on OrderProducts, but that would be another hit on the database just to bring back data the query I ran before can already get. .query on it gives me:
SELECT "shop_product"."id", "shop_product"."name"
FROM "shop_product"
INNER JOIN "shop_orderproducts" ON ("shop_product"."id" = "shop_orderproducts"."product_id")
INNER JOIN "shop_order" ON ("shop_orderproducts"."order_id" = "shop_order"."id")
WHERE ("shop_order"."user_id" = 2 AND "shop_order"."status" = completed )
ORDER BY "shop_product"."ordering" ASC
If I could SELECT * instead of specific fields I'd have all the data that I need in one query. Is there anyway to build that query and get only the data related to it?
EDIT
I feel I need to clarify some points, I'm sorry I haven't been clearer:
I'm not querying against OrderProduct because some products are public and don't have to be bought but I still have to list them, and they'd not be returned by a query against OrderProduct
The result I'm expecting is a list of products, along with their Order data (in case they have it). In JSON, it'd look somewhat like this
[{id: 1, order: 1, expiration: 2013-03-03, public: false},
{id: 1, order: , expiration: , public: true
Thanks
I'm gonna get every entry of OrderProduct with Product1, but I just
want the one returned from my queryset.
You just want which "one"? Your query is filtering on the Product model, so all Users, Orders, and OrderProducts associated with each of the Products in the returned queryset will be accessible.
If you want one specific OrderProduct, then you should be filtering as op = OrderProduct.objects.filter(xxxxx) and then accessing the models up the chain like so:
op.product, op.order, etc.
I would have suggested the method prefetch_related, but this isn't available in Django 1.3.
Dan Hoerst is right about selecting from OrderProduct, but that still hits the database more than necessary. We can stop that by using the select_related method.
>>> from django.db import connection
>>> len(connection.queries)
0
>>> first_result = OrderProduct.objects.select_related("order__user", "product")
... .filter( order__status="completed",
... order__user__pk=2 )[0]
>>> len(connection.queries)
1
>>> name = first_result.order.user.name
>>> len(connection.queries)
1
>>> product_name = first_result.product.name
>>> len(connection.queries)
1
Related
I have Model "A" that both relates to another model and acts as a public face to the actual data (Model "B"), users can modify the contents of A but not of B.
For every B there can be many As, and they have a one to many relation.
When I display this model anytime there's two or more A's related to the B I see "duplicate" records with (almost always) the same data, a bad experience.
I want to return a queryset of A items that relate to the B items, and when there's more than one roll them up to the first entered item.
I also want to count the related model B items and return that count to give me an indication of how much duplication is available.
I wrote the following analogous SQL query which counts the related items and uses first_value to find the first A created partitioned by B.
SELECT *
FROM
(
SELECT
COUNT(*) OVER (PARTITION BY b_id) as count_related_items,
FIRST_VALUE(id) OVER (PARTITION BY b_id order by created_time ASC) as first_filter,
*
FROM A
) AS A1
WHERE
A1.first_filter = A1.id;
As requested, here's a simplified view of the models:
class CoreData(models.Model):
title = models.CharField(max_length=500)
class UserData(models.Model):
core = models.ForeignKey("CoreData", on_delete=models.CASCADE)
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
title = models.CharField(max_length=500)
When a user creates data it first checks/creates the CoreData, storing things like the title, and then it creates the UserData, with a reference to the CoreData.
When a second user creates a piece of data and it references the same CoreData is when the "duplication" is introduced and why you can roll up the UserData (in SQL) to find the count and the "first" entry in the one to many relation.
Assuming my understanding is correct -
If you are querying from the UserData model the query would look something like this:
Considering CoreData.id = 18
user_data = UserData.objects.filter(core__id=18).
order_by("created_time").annotate(duplicate_count=Count('core__userData', filter(core__id=18))).first()
user_data would be the First object created which is related to the CoreData object. Also,
user_data.duplicate_count will give you the Count of UserData objects that are related to the CoreData object.
Reference Docs on Annotate here
Update:
If you need the list of UserData of specific CoreData you could use
user_data = UserData.objects.filter(core__id=18).
order_by("created_time").annotate(duplicate_count=Count('core__UserData', filter(core__id=18)))
Here are my classes. What is a good way to query the User for the embedded Review objects and paginate it somehow?
class Review(EmbeddedDocument):
review_body = StringField(required=True)
review_stars = FloatField(required=True, default=0)
reviewer = LazyReferenceField('User')
review_date = DateTimeField()
class User(Document):
userreviews = EmbeddedDocumentListField(Review)
I'm sure I can get the total easily with .count() but I'm not sure how to skip and query for only a portion of the objects.
The answer, it seems, lies in the way I was querying the database. I was doing it incorrectly for what I was trying to accomplish.
What wasn't working:
userprofile: user.User = user.User.objects(
username=request.form["username"]).get()
all_reviews = userprofile.userreviews # work with objects here
What should work:
Using .filter() and .slice() I can get the embedded Review objects like so:
reviews = user.User.objects.filter(username=request.form["username"]).fields(
slice__userreviews=[0, 2])
for r in reviews:
print(r.userreviews)
This will return 2 review objects starting at the index of 0. Then I just need to use .count() to get the total amount and I should have the 3 elements I need for pagination.
I have this kind of model definition and I wish to have a list of product that have attribute distance < 40 from the respective product
class Product(models.Model):
title = models.CharField(max_length=255)
near_duplicate_images = models.ManyToManyField("self", through="NearDuplicate")
class NearDuplicate(models.Model):
first_product = models.ForeignKey(Product, on_delete=models.CASCADE, related_name="first_product")
second_product = models.ForeignKey(Product, on_delete=models.CASCADE, related_name="second_product")
distance = models.IntegerField(null=True, blank=True)
I've tried doing this to directly access the relation
p = Product.objects.filter(near_duplicate_images__distance__lt=40).prefetch_related('near_duplicate_images')
But it raise this exception
django.core.exceptions.FieldError: Related Field got invalid lookup: distance
I've also tried doing this
p = Product.objects.all().prefetch_related(Prefetch("near_duplicate_images", queryset=NearDuplicate.objects.filter(distance__lt=40), to_attr="near_duplicate_images_list"))
But it raise this exception
django.core.exceptions.FieldError: Cannot resolve keyword 'near_duplicate_images_rel_+' into field. Choices are: distance, first_product, first_product_id, id, second_product, second_product_id
I think a query like this should work. As you want the list of products whose distance is less than 40.
products = NearDuplicate.objects.filter(distance__lt=40).values('first_product', 'second_product')
This would give an output similar to
<QuerySet [{'first_product': 1, 'second_product': 2}]>
UPDATE - I have played around the queries a bit. If you want to get the absolute list of products present in any of the first_product or second_product. You may need to use multiple queries like
q1= NearDuplicate.objects.filter(distance__lt=40).values_list('first_product', flat=True)
This would give output as
<QuerySet [1]>
and
q2 = NearDuplicate.objects.filter(distance__lt=40).values_list('second_product', flat=True)
This would give output as
<QuerySet [2]>
First query queries all the products listed in first_product and second query lists all the products present in second_product. Now you can merge them both and take out the distinct values using the following query
q1.union(q2).distinct()
This would give the final output as
<QuerySet [1, 2]>
I hope it helps. :)
I think you don't need "near_duplicate_images" field.
Try something like (haven't tested):
p = Product.objects.filter(title__in=NearDuplicate.object.filter(first_product=current_product, distance=40).values('second_product',)
QuerySet API - "_in"
I have a model that tracks the number of impressions for ads.
class Impression(models.Model):
ad = models.ForeignKey(Ad, on_delete=models.CASCADE)
user_ip = models.CharField(max_length=50, null=True, blank=True)
clicked = models.BooleanField(default=False)
time_created = models.DateTimeField(auto_now_add=True)
I want to find all the user_ip that has more than 1000 impressions. In other words, if a user_ip comes up in more than 1000 instances of Impression. How can I do that? I wrote a function for this but it is very inefficient and slow because it loops over every impression.
def check_ip():
for i in Impression.objects.all():
if Impression.objects.filter(user_ip=i.user_ip).count() > 1000:
print(i.user_ip)
You should be able to do this in one query with aggregation.. it is possible to filter on aggregate values (like Count()) as follows:
from django.db.models import Count
for ip in Impression.objects.values('user_ip').annotate(ipcount=Count('user_ip')).filter(ipcount__gt=1000):
# do something
Django querysets have an annotate() method which supports what you're trying to do.
from django.db.models import Count
Impression.objects.values('user_ip')\
.annotate(ip_count=Count('user_ip'))\
.filter(ip_count__gt=1000)
This will give you a queryset which returns dictionaries with 'user_ip' and 'ip_count' keys when used as an iterable.
To understand what's happening here you should look at Django's aggregation guide: https://docs.djangoproject.com/en/1.11/topics/db/aggregation/ (in particular this section which explains how annotate interacts with values)
The SQL generated is something like:
SELECT "impression"."user_ip", COUNT("impression"."user_ip") AS "ip_count"
FROM "impression"
GROUP BY "impression"."ip"
HAVING COUNT("impression"."ip") > 1000;
class Price(models.Model):
date = models.DateField()
price = models.DecimalField(max_digits=6, decimal_places=2)
product = models.ForeignKey("Product")
class Product(models.Model):
name = models.CharField(max_length=256)
price_history = models.ManyToManyField(Price, related_name="product_price", blank=True)
I want to query Product such that I return only those products for whom the price on date x is higher than any earlier date.
Thanks boffins.
As Marcin said in another answer, you can drill down across relationships using the double underscore syntax. However, you can also chain them and sometimes this can be easier to understand logically, even though it leads to more lines of code. In your case though, I might do something that would look this:
first you want to know the price on date x:
a = Product.objects.filter(price_history__date = somedate_x)
you should probably test to see if there are more than one per date:
if a.count() == 1:
pass
else:
do something else here
(or something like that)
Now you have your price and you know your date, so just do this:
b = Product.objects.filter(price_history__date__lt = somedate, price_history__price__gt=a[0].price)
know that the slice will hit the database on its own and return an object. So this query will hit the database three times per function call, once for the count, once for the slice, and once for the actual query. You could forego the count and the slice by doing an aggregate function (like an average across all the returned rows in a day) but those can get expensive in their own right.
for more information, see the queryset api:
https://docs.djangoproject.com/en/dev/ref/models/querysets/
You can perform a query that spans relationships using this syntax:
Product.objects.filter(price_history__price = 3)
However, I'm not sure that it's possible to perform the query you want efficiently in a pure django query.