I would like to write django queryset which is equivalent of below query with one hit in db. Right now I am using manager.raw() to execute.
With annotate, I can generate the inner query. But I can't use that in the filter condition (when I checked queryset.query, it looks like ex1).
select *
from table1
where (company_id, year) in (select company_id, max(year) year
from table1
where company_id=3
and total_employees is not null
group by company_id);
Ex1:
SELECT `table1`.`company_id`, `table1`.`total_employees`
FROM `table1`
WHERE `table1`.`id` = (SELECT U0.`company_id` AS Col1, MAX(U0.`year`) AS `year`
FROM `table1` U0
WHERE NOT (U0.`total_employees` IS NULL)
GROUP BY U0.`company_id`
ORDER BY NULL)
Model:
class Table1(models.Model):
year = models.IntegerField(null=False, validators=[validate_not_null])
total_employees = models.FloatField(null=True, blank=True)
company = models.ForeignKey('Company', on_delete=models.CASCADE, related_name='dummy_relation')
last_modified = models.DateTimeField(auto_now=True)
updated_by = models.CharField(max_length=100, null=False, default="research")
class Meta:
unique_together = ('company', 'year',)
I appreciate your response.
You can use OuterRef and Subquery to achive it. Try like this:
newest = Table1.objects.filter(company=OuterRef('pk'), total_employees_isnull=False).order_by('-year')
companies = Company.objects.annotate(total_employees=Subquery(newest.values('total_employees')[:1])).annotate(max_year=Subquery(newest.values('year')[:1]))
# these queries will not execute until you call companies. So DB gets hit once
Show values:
# all values
companies.values('id', 'total_employees', 'max_year')
# company three values
company_three_values = companies.filter(id=3).values('id', 'total_employees', 'max_year')
Filter on Max Year:
companies_max = companies.filter(max_year__gte=2018)
FYI: OuterRef and Subquery is available in Django from version 1.11
if you have model name is Table1, try this.
Table1.objects.get(pk=Table1.objects.filter(company_id=3, total_employees_isnull=False).latest('year').first().id)
This maybe one hit in db.
But if .first() not match anything. Better like this:
filter_item = Table1.objects.filter(company_id=3, total_employees_isnull=False).latest('year').first()
if filter_item:
return Table1.objects.get(pk=filter_item.id)
Related
i'm try to remove duplicated row with distinct or annotate then measure count of something.(mysql:8 ,django:2.2)
sql
select t1.cp_id, count(*) from (
SELECT user_id, product_id, cp_id, count(*) as cnt FROM A
where created_at between '2021-07-12' and '2021-07-13'
group by user_id, product_id, cp_id
) as t1
group by t1.cp_id
my queryset
A.objects.filter(
created_at__gte='2021-07-12',
created_at__lt='2021-07-13'
).values('cp_id', 'user_id', 'product_id').annotate(cnt=Count('cp_id')).values('cp_id').annotate(count=Count('cp_id'))
this queryset sql
SELECT A.cp_id, COUNT(cp_id`) AS count FROM A
WHERE (
created_at >= 2021-07-11 19:30:00 AND
created_at < 2021-07-12 19:30:00 AND
)
GROUP BY cp_id, created_at
ORDER BY created_at ASC
I'm confused why Django ignores the order of my commands and executes the last query commands on the original table (not the table where the duplicate data is deleted)
Thanks if you have a solution or idea to remove the rows that have the same 'cp_id', 'user_id', 'product_id' (not from the database), then calculate the number of repetitions per cp_id
my modele.py
class A(models.Model):
product = models.ForeignKey(Product,on_delete=models.CASCADE)
user = models.ForeignKey(User, on_delete=models.CASCADE)
cp = models.ForeignKey(CP, on_delete=models.CASCADE)
created_at = models.DateTimeField()
Try this query:
A.objects.filter(
created_at__range=[2021-07-12, 2021-07-14],
).annotate(cp_count=Count('cp_id')).values('cp_count', 'user_id', 'product_id', 'created_at').order_by('created_at')
I have two models: Comments and CommentFlags
class Comments(models.Model):
content_type = models.ForeignKey(ContentType,
verbose_name=_('content type'),
related_name="content_type_set_for_%(class)s",
on_delete=models.CASCADE)
object_pk = models.CharField(_('object ID'), db_index=True, max_length=64)
content_object = GenericForeignKey(ct_field="content_type", fk_field="object_pk")
submit_date = models.DateTimeField(_('date/time submitted'), default=None, db_index=True)
...
...
class CommentFlags(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name="comment_flags",
on_delete=models.CASCADE)
comment = models.ForeignKey(Comment, related_name="flags", on_delete=models.CASCADE)
flag = models.CharField(max_length=30, db_index=True)
...
...
CommentFlags flag can have values: like, dislike etc.
Problem Statement: I want to get all Comments sorted by number of likes in DESC manner.
Raw Query for above problem statement:
SELECT
cmnts.*, coalesce(cmnt_flgs.num_like, 0) as num_like
FROM
comments cmnts
LEFT JOIN
(
SELECT
comment_id, Count(comment_id) AS num_like
FROM
comment_flags
WHERE
flag='like'
GROUP BY comment_id
) cmnt_flgs
ON
cmnt_flgs.comment_id = cmnts.id
ORDER BY
num_like DESC
I have not been able to convert the above query in Django ORM Queryset.
What I have tried so far...
>>> qs = (Comment.objects.filter(flags__flag='like').values('flags__comment_id')
.annotate(num_likes=Count('flags__comment_id')))
which generates different query.
>>> print(qs.query)
>>> SELECT "comment_flags"."comment_id",
COUNT("comment_flags"."comment_id") AS "num_likes"
FROM "comments"
INNER JOIN "comment_flags"
ON ("comments"."id" = "comment_flags"."comment_id")
WHERE "comment_flags"."flag" = 'like'
GROUP BY "comment_flags"."comment_id",
"comments"."submit_date"
ORDER BY "comments"."submit_date" ASC
LIMIT 21
Problem with above ORM queryset is, it uses InnerJoin and also I don't know how it adds submit_date in groupby clause.
Can you please suggest me a way to convert above mentioned Raw query to Django ORM queryset ?
You can try using filter argument in Count:
qs = (Comment.objects.all()
.annotate(num_likes=Count('flags__comment_id', filter=Q(flags__flag='like'))))
It may produce slightly different query that you're expecting, depending on the database backend, but it should have equivalent behavior.
I have following DB model:
class Table1( models.Model ):
sctg = models.CharField(max_length=100, verbose_name="Sctg")
emailAddress = models.CharField(max_length=100, verbose_name="Email Address", default='')
def __unicode__(self):
return str( self.sctg )
class Table2( models.Model ):
sctg = models.ForeignKey( Table1 )
street = models.CharField(max_length=100, verbose_name="Street")
zipCode = models.CharField(max_length=100, verbose_name="Zip Code")
def __unicode__(self):
return str( self.sctg )
and I would like to execute select query.
This is what I did:
sctg = Table1.objects.get( sctg = self.sctg )
data = Table2.objects.get( sctg = sctg )
and it works but now I am executing 2 queries. Is there a chance to do this in only one ? in raw SQL I'd do a JOIN query but no idea how to do this in Django models.
You can use two consecutive underscores to look "through" a ForeignKey reference. So your query is equivalent to:
Table2.objects.get(sctg__sctg=self.sctg)
The non-boldface part thus looks through the ForeignKey, whereas the boldface part corresponds to the CharField column.
Note that
it is possible that there is no such Table2 element, or multiple. In both cases this will result in an error. In case you want to retrieve all (possibly empty), you can use .filter(..) over .get(..);
here self.sctg should be a string (or something string-like) since the sctg of Table1 is a CharField.
The above will result in some sort of query like:
SELECT t2.*
FROM table2 AS t2
INNER JOIN table1 AS t1 ON t2.sctg = t1.id
WHERE t1.sctg = 'mysctg'
where 'mysctg' is the value stored in you self.sctg.
I need to use different order_by & distinct values, and I have made an attempt using a subquery.
How can I achieve this?
Could a qset select the Products I want, and then in a separate query, select the 15 Variations whose price you want to display?
In other words: Qset randomly selects product ID's (in a queryset), then python tells it to return a queryset of just those 15 items.
Speeding up the query too is important- as it takes ~800ms (when I order_by the pk) or 5.8seconds when I use order_by '?'.
My attempt:
distinct_qs = (
Product.objects
.distinct('id')
)
qset = (
Product.objects
.filter(pk__in=distinct_qs)
.order_by('rating', '?')
.values('name', 'image',)
.annotate(
price=F('variation__price__price'),
id=F('pk'),
vari=F('variation'),
)[:15]
)
Sample of output data:
{"name":"Test Item","vari":10, id":1, "price":"80", "image":"xyz.com/1.jpg"},
{"name":"Test Item","vari":11, id":1, "price":"80", "image":"xyz.com/1.jpg"},
{"name":"Another one","vari":14, id":2, "price":"10", "image":"xyz.com/2.jpg"},
{"name":"Another one","vari":15, id":2, "price":"10", "image":"xyz.com/2.jpg"},
{"name":"And Again","vari":17, id":3, "price":"12", "image":"xyz.com/3.jpg"},
{"name":"And Again","vari":18, id":3, "price":"12", "image":"xyz.com/3.jpg"},
Desired output data:
{"name":"Test Item","vari":13, id":1, "price":"80", "image":"xyz.com/1.jpg"},
{"name":"Another one","vari":14, id":2, "price":"10", "image":"xyz.com/2.jpg"},
{"name":"And Again","vari":17, id":3, "price":"12", "image":"xyz.com/3.jpg"},
Sample of models.py
class Product(models.Model):
name = models.CharField ("Name", max_length=400)
...
class Variation(models.Model):
product = models.ForeignKey(Product, db_index=True, blank=False, null=False)
...
class Image(models.Model):
variation = models.ForeignKey(Variation, blank=False, null=False)
image = models.URLField(max_length=540, blank=True, null=True)
class Price(models.Model):
price = models.DecimalField("Price", decimal_places=2, max_digits=10)
variation = models.ForeignKey(Variation, blank=False, null=False)
I think you should write a custom model manager (see https://docs.djangoproject.com/en/1.9/topics/db/managers/ ) and create a method there which you then would use for returning variations instead of a standard query.
For randomising you could do like this:
select the last id of Variation (or Product), then generate different random 15 ids from that interval and then just pull objects with those ids from database. I think it should work faster.
I am using Django==1.8.7 and I have the following models
# a model in users.py
class User(models.Model):
id = models.AutoField(primary_key=True)
username = models.CharField(max_length=100, blank=True)
displayname = models.CharField(max_length=100, blank=True)
# other fields deleted
# a model in healthrepo.py
class Report(models.Model):
id = models.AutoField(primary_key=True)
uploaded_by = models.ForeignKey(User, related_name='uploads',
db_index=True)
owner = models.ForeignKey(User, related_name='reports', db_index=True)
# other fields like dateofreport, deleted
I use the following Django queryset:
Report.objects.filter(owner__id=1).values('uploaded_by__username',
'uploaded_by__displayname').annotate(
total=Count('uploaded_by__username')
)
I see that this generates the following query:
SELECT T3."username", T3."displayname", COUNT(T3."username") AS "total" FROM "healthrepo_report"
INNER JOIN "users_user" T3 ON ( "healthrepo_report"."uploaded_by_id" = T3."id" )
WHERE "healthrepo_report”.”owner_id" = 1
GROUP BY T3."username", T3."displayname", "healthrepo_report"."dateofreport", "healthrepo_report”.”owner_id", "healthrepo_report"."uploaded_by_id"
ORDER BY "healthrepo_report"."dateofreport" DESC, "healthrepo_report"."user_id" ASC, "healthrepo_report"."uploaded_by_id" ASC
However, what I really wanted was just grouping based on "healthrepo_report”.”owner_id" and not multiple fields. i.e. What I wanted was:
SELECT T3."username", T3."displayname", COUNT(T3."username") AS "total" FROM "healthrepo_report"
INNER JOIN "users_user" T3 ON ( "healthrepo_report"."uploaded_by_id" = T3."id" )
WHERE "healthrepo_report”.”owner_id" = 1
GROUP BY T3."username", T3."displayname" ORDER BY "healthrepo_report"."dateofreport" DESC, "healthrepo_report"."user_id" ASC, "healthrepo_report"."uploaded_by_id" ASC
I am wondering why this is happening and how do I get grouping based on single column.
I just saw this post:
Django annotate and values(): extra field in 'group by' causes unexpected results
Changing the query by adding empty order_by() fixes it
Report.objects.filter(owner__id=1).values('uploaded_by__username',
'uploaded_by__displayname').annotate(
total=Count('uploaded_by__username')
).order_by()