Django documentation states to check the ORM before writing raw SQL, but I am not aware of a resource that explains how to perform inner joins between tables without foreign keys, something that would be relatively simple to execute in a few lines of SQL.
Many of the tables in my database need to be joined by time-related properties or intervals. For example, in this basic example I need to first take a subset of change_events, and then join a different table ais_data on two separate columns:
models.py
class AisData(models.Model):
navstatus = models.BigIntegerField(blank=True, null=True)
start_ts = models.DateTimeField()
end_ts = models.DateTimeField(blank=True, null=True)
destination = models.TextField(blank=True, null=True)
draught = models.FloatField(blank=True, null=True)
geom = models.PointField(srid=4326)
imo = models.BigIntegerField(primary_key=True)
class Meta:
managed = False
db_table = 'ais_data'
unique_together = (('imo', 'start_ts'),)
class ChangeEvent(models.Model):
event_id = models.BigIntegerField(primary_key=True)
imo = models.BigIntegerField(blank=True, null=True)
timestamp = models.DateTimeField(blank=True, null=True)
class Meta:
managed = False
db_table = 'change_event'
First I take a subset of change_events which returns a QuerySet object. Using the result of this query, I need to get all of the records in ais_data that match on imo and timestamp - so the raw SQL would look exactly like this:
WITH filtered_change_events AS (
SELECT * FROM change_events WHERE timestamp BETWEEN now() - interval '1' day and now()
)
SELECT fce.*, ad.geom FROM filtered_change_events fce JOIN ais_data ad ON fce.imo = ad.imo AND fce.timestamp = ad.start_ts
views.py
from .models import ChangeEvent
from .models import AisData
from datetime import datetime, timedelta
start_period = datetime.now() - timedelta(hours=24)
end_period = datetime.now()
subset_change_events = ChangeEvent.filter(timestamp__gte=start_period,
timestamp__lte=end_period)
#inner join
subset_change_events.filter()?
How would one write this relatively simple query using the language of Django ORM? I am finding it difficult to make a simple inner join on two columns in without using a foreign key? Any advice or links to resources would be helpful.
Related
I've got a bunch of reviews in my app. Users are able to "like" reviews.
I'm trying to get the most liked reviews. However, there are some popular users on the app, and all their reviews have the most likes. I want to only select one review (ideally the most liked one) per user.
Here are my objects,
class Review(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='review_user', db_index=True)
review_text = models.TextField(max_length=5000)
rating = models.SmallIntegerField(
validators=[
MaxValueValidator(10),
MinValueValidator(1),
],
)
date_added = models.DateTimeField(db_index=True)
review_id = models.AutoField(primary_key=True, db_index=True)
class LikeReview(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='likereview_user', db_index=True)
review = models.ForeignKey(Review, on_delete=models.CASCADE, related_name='likereview_review', db_index=True)
date_added = models.DateTimeField()
class Meta:
unique_together = [['user', 'review']]
And here's what I currently have to get the most liked reviews:
reviews = Review.objects.filter().annotate(
num_likes=Count('likereview_review')
).order_by('-num_likes').distinct()
As you can see, the reviews I get will be sorted by the most likes, but its possible that the top liked reviews are all by the same user. I want to add distinct('user') here but I get annotate() + distinct(fields) is not implemented.
How can I accomplish this?
This will be a bit badly readable because of your related names. I would suggest to change Review.user.related_name to reviews, it will make this much more understandable, but I've elaborated on that in the second part of the answer.
With your current setup, I managed to do it fully in the DB using subqueries:
from django.db.models import Subquery, OuterRef, Count
# No DB Queries
best_reviews_per_user = Review.objects.all()\
.annotate(num_likes=Count('likereview_review'))\
.order_by('-num_likes')\
.filter(user=OuterRef('id'))
# No DB Queries
review_sq = Subquery(best_reviews_per_user.values('review_id')[:1])
# First DB Query
best_review_ids = User.objects.all()\
.annotate(best_review_id=review_sq)\
.values_list('best_review_id', flat=True)
# Second DB Query
best_reviews = Review.objects.all()\
.annotate(num_likes=Count('likereview_review'))\
.order_by('-num_likes')\
.filter(review_id__in=best_review_ids)\
.exclude(num_likes=0) # I assume this is the case
# Print it
for review in best_reviews:
print(review, review.num_likes, review.user)
# Test it
assert len({review.user for review in best_reviews}) == len(best_reviews)
assert sorted([r.num_likes for r in best_reviews], reverse=True) == [r.num_likes for r in best_reviews]
assert all([r.num_likes for r in best_reviews])
Let's try with this completely equivalent model structure:
from django.db import models
from django.utils import timezone
class TimestampedModel(models.Model):
"""This makes your life much easier and is pretty DRY"""
created = models.DateTimeField(default=timezone.now)
class Meta:
abstract = True
class Review(TimestampedModel):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='reviews', db_index=True)
text = models.TextField(max_length=5000)
rating = models.SmallIntegerField()
likes = models.ManyToManyField(User, through='ReviewLike')
class ReviewLike(TimestampedModel):
user = models.ForeignKey(User, on_delete=models.CASCADE, db_index=True)
review = models.ForeignKey(Review, on_delete=models.CASCADE, db_index=True)
The likes are a clear m2m relationship between reviews and users, with an extra timestamp column - it's a model use for a Through model. Docs here.
Now everything is imho much much easier to read.
from django.db.models import OuterRef, Count, Subquery
# No DB Queries
best_reviews = Review.objects.all()\
.annotate(like_count=Count('likes'))\
.exclude(like_count=0)\
.order_by('-like_count')\
# No DB Queries
sq = Subquery(best_reviews.filter(user=OuterRef('id')).values('id')[:1])
# First DB Query
user_distinct_best_review_ids = User.objects.all()\
.annotate(best_review=sq)\
.values_list('best_review', flat=True)
# Second DB Query
best_reviews = best_reviews.filter(id__in=user_distinct_best_review_ids).all()
One way of doing it is as follows:
Get a list of tuples that represent the user.id and review.id, ordered by user and number of likes ASCENDING
Convert the list to a dict to remove duplicate user.ids. Later items replace earlier ones, which is why the ordering in step 1 is important
Create a list of review.ids from the values in the dict
Get a queryset using the list of review.ids, ordered by the number of likes DESCENDING
from django.db.models import Count
user_review_list = Review.objects\
.annotate(num_likes=Count('likereview_review'))\
.order_by('user', 'num_likes')\
.values_list('user', 'pk')
user_review_dict = dict(user_review_list)
review_pk_list = list(user_review_dict.values())
reviews = Review.objects\
.annotate(num_likes=Count('likereview_review'))\
.filter(pk__in=review_pk_list)\
.order_by('-num_likes')
I have two models: Comments and CommentFlags
class Comments(models.Model):
content_type = models.ForeignKey(ContentType,
verbose_name=_('content type'),
related_name="content_type_set_for_%(class)s",
on_delete=models.CASCADE)
object_pk = models.CharField(_('object ID'), db_index=True, max_length=64)
content_object = GenericForeignKey(ct_field="content_type", fk_field="object_pk")
submit_date = models.DateTimeField(_('date/time submitted'), default=None, db_index=True)
...
...
class CommentFlags(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name="comment_flags",
on_delete=models.CASCADE)
comment = models.ForeignKey(Comment, related_name="flags", on_delete=models.CASCADE)
flag = models.CharField(max_length=30, db_index=True)
...
...
CommentFlags flag can have values: like, dislike etc.
Problem Statement: I want to get all Comments sorted by number of likes in DESC manner.
Raw Query for above problem statement:
SELECT
cmnts.*, coalesce(cmnt_flgs.num_like, 0) as num_like
FROM
comments cmnts
LEFT JOIN
(
SELECT
comment_id, Count(comment_id) AS num_like
FROM
comment_flags
WHERE
flag='like'
GROUP BY comment_id
) cmnt_flgs
ON
cmnt_flgs.comment_id = cmnts.id
ORDER BY
num_like DESC
I have not been able to convert the above query in Django ORM Queryset.
What I have tried so far...
>>> qs = (Comment.objects.filter(flags__flag='like').values('flags__comment_id')
.annotate(num_likes=Count('flags__comment_id')))
which generates different query.
>>> print(qs.query)
>>> SELECT "comment_flags"."comment_id",
COUNT("comment_flags"."comment_id") AS "num_likes"
FROM "comments"
INNER JOIN "comment_flags"
ON ("comments"."id" = "comment_flags"."comment_id")
WHERE "comment_flags"."flag" = 'like'
GROUP BY "comment_flags"."comment_id",
"comments"."submit_date"
ORDER BY "comments"."submit_date" ASC
LIMIT 21
Problem with above ORM queryset is, it uses InnerJoin and also I don't know how it adds submit_date in groupby clause.
Can you please suggest me a way to convert above mentioned Raw query to Django ORM queryset ?
You can try using filter argument in Count:
qs = (Comment.objects.all()
.annotate(num_likes=Count('flags__comment_id', filter=Q(flags__flag='like'))))
It may produce slightly different query that you're expecting, depending on the database backend, but it should have equivalent behavior.
I am making a little Time Attendance Application in Django.
I'm experimenting with models in my app. I can't figure out how I can do this:
start_time datetime NOT NULL,
finish_time datetime NULL,
duration int(11) GENERATED AS (TIMESTAMPDIFF(MINUTE, start_time, end_time)) STORED NULL,
In django.
So far, I've made a table called employees with all the employees' details:
class employees(models.Model):
employee_id = models.AutoField(primary_key=True)
first_name = models.CharField(max_length=15)
last_name = models.CharField(max_lenth=20)
user_pic_one = models.ImageField()
user_pic_two = models.ImageField()
user_pic_three = models.ImageField()
age = models.IntegerField()
national_id = models.CharField(max_length=15)
join_date = models.DateField()
pay_structure = models.CharField()
What I want to do is,
Make a new table that has the 5 columns.
employee_id (as a foreign key from the employees class we just made)
start_time = models.TimeField()
end_time = models.TimeField()
duration = models.IntegerField()
date = models.DateField(default=date.today)
So the only two things that I want to know are:
How to do the foreign key stuff to verify the employee id in the later table from the employees table.
and
calculate the time duration in minutes from the start_time to the end_time.
Thanks :)
To the best of my knowledge, at the moment of writing, Django has no builtin generated columns (as in a way to create columns that are calculated at the database side). Usually these are not necessary anyway, since we can annotate the queryset.
We can for example define a manager like:
from django.db.models import DurationField, ExpressionWrapper, F
class RegistrationManager(models.Manager):
def get_queryset(self):
return super().get_queryset().annotate(
duration=ExpressionWrapper(
F('end_time')-F('start_time'),
output_field=DurationField(null=True)
)
)
Then in our "Registration" model could then look like:
class Registration(models.Model):
employee = models.ForeignKey(Employee, on_delete=models.CASCADE)
start_time = models.DateTimeField()
end_time = models.DateTimeField(null=True)
objects = RegistrationManager()
Each time you access Registration.objects..., Django will annotate the model with an extra column duration that contains the difference between end_time and start_time. It will use a DurationField [Django-doc] for that, and thus the attributes will be timedeltas.
You can for example filter the Registration objects on the employee and duration with:
from datetime import timedelta
Registration.objects.filter(employee_id=14, duration__gte=timedelta(minutes=15))
Django will automatically add foreign key constraints on the ForeignKeyField [Django-doc]. You furthermore should specify what on_delete=... trigger [Django-doc] you want to use. We specified a foreign key to the Employee model, so Django will create a column with the name employee_id that stores the primary key of the Employee to which we refer. You can use some_registration.employee to load the related Employee object in memory. This will usually require an extra query (unless you use .select_related(..) [Django-doc] or .prefetch_related(..) [Django-doc]).
Note: Model names are normally singular and written in CamelCase, so Employee instead of employees.
I have a 100k entries per day and I am using them to output in an API(i have a limit and and offset by default). I want to calculate values in my queryset if they have a common owner_id and leave the rest as it is if no common owner for the date delta
What i am doing now but doesnt look to be correct( it doest calculate some data correct tho, but some data is increased as well for some reason, which should have not been)
TrendData.objects.filter(owner__trend_type__mnemonic='posts').filter(
date_trend__date__range=[date_from, date_to]).values('owner__name').annotate(
views=(Sum('views') / date_delta),
views_u=(Sum('views_u') / date_delta),
likes=(Sum('likes') / date_delta),
shares=(Sum('shares') / date_delta),
interaction_rate=(
Sum('interaction_rate') / date_delta),
)
date_delta = date_to - date_from #<- integer
my models are:
class Owner(models.Model):
class Meta:
verbose_name_plural = 'objects'
TREND_OWNERS = Choices('group', 'user')
link = models.CharField(max_length=255)
name = models.CharField(max_length=255)
owner_type = models.CharField(choices=TREND_OWNERS, max_length=50)
trend_type = models.ForeignKey(TrendType, on_delete=models.CASCADE)
def __str__(self):
return f'{self.link}[{self.trend_type}]'
class TrendData(models.Model):
class Meta:
verbose_name_plural = 'Trends'
owner = models.ForeignKey(Owner, on_delete=models.CASCADE)
views = models.IntegerField()
views_u = models.IntegerField()
likes = models.IntegerField()
shares = models.IntegerField()
interaction_rate = models.DecimalField(max_digits=20, decimal_places=10)
mean_age = models.IntegerField()
source = models.ForeignKey(TrendSource, on_delete=models.CASCADE)
date_trend = models.DateTimeField()
Source parent model doesn't really help in that case, it's a csv file data was loaded from, so we don't ever reference it.
What I want is, is it possible to calculate sum of views, views_u, likes, shares, interaction_rate if the owner is met for both days (let's say 01.01.19 to 10.01.2019) and if there are 2 of the owners in both days calculate the Sum if not skip and leave it as a simple queryset without summing ALL the values in it, if met then calculate and leave the rest as it is.
I can do it with a python, but i think it is possible to do in django ORM
Django ORM provides a conditional expressions for doing this kind of condition based annotations. You can use Case to annotate the Sum based on the condition you mentioned.
TrendData.objects.filter(owner__trend_type__mnemonic='posts').annotate(
views=Sum(
Case(
When("Your condition here", then=F('views')),
default=0,
output_field=IntegerField(),
)
)
...
)
I have 4 tables to join; Personnels,Machines and Locations. I want to join these tables and add where clause to end of the ORM query if request body includes filtering data. Here is my models and raw query (I want to write this query in django ORM) and sample if condition for where clause;
Models ;
class Sales(models.Model):
MachineId = models.ForeignKey(Machines,on_delete=models.CASCADE,db_column='MachineId',related_name='%(class)s_Machine')
PersonnelId = models.ForeignKey(Personnels,on_delete=models.CASCADE,db_column='PersonnelId',related_name='%(class)s_Personnel')
LocationId = models.ForeignKey(Locations,on_delete=models.CASCADE,db_column='LocationId',related_name='%(class)s_Location')
class Meta:
db_table = "Sales"
class Machines(models.Model):
Name = models.CharField(max_length=200)
Fee = models.DecimalField(max_digits=10,decimal_places=3)
class Meta:
db_table = "Machines"
class Personnels(models.Model):
name = models.CharField(max_length=200)
surname = models.CharField(max_length=200)
class Meta:
db_table = "Personnels"
class Locations(models.Model):
Latitude = models.FloatField()
Longitude = models.FloatField()
LocationName = models.CharField(max_length=1000)
class Meta:
db_table = "Locations"
As you see I have 4 models. "Sales" table has foreignkeys to others. I want to get all informations in tables with using these foreign keys.(With Inner Join)
query = '''select * from "Sales" as "SL" INNER JOIN "Personnels" as "PL" ON ("SL"."PersonnelId" = "PL"."user_id") INNER JOIN "Machines" as "MC" ON ("SL"."MachineId" = "MC"."id") INNER JOIN "Locations" as "LC" ON ("SL"."LocationId" = "LC"."id") '''
if request.method=='POST':
if request.data['personnel_name'] and request.data['personnel_name'] is not None:
personnel_name = request.data['personnel_name']
condition = '''WHERE "PL"."name" = '{0}' '''.format(personnel_name)
query = query+condition
As it is seen, there are lots of quotes (if I don't write,postgresql makes some trouble) and code is not clean.
My question is, how can I write this query with using django ORM? As you see, I want to add where conditions dynamically. How can I achieve that?
I'm going to use conventional naming, with only class names captilized, and model names singular.
class Sale(models.Model):
machine = models.ForeignKey(Machine, on_delete=models.CASCADE)
person = models.ForeignKey(Person, on_delete=models.CASCADE)
location = models.ForeignKey(Location, on_delete=models.CASCADE)
db_column and db_table is useful if you have to connect the django app use an existing database. If not, django will create sensible table names by default. The table name can be different from the model field name.
To create a join where, use a queryset filter.
Sale.objects.filter(person__name='Jane Janes')
You might not need more joins, since django will perform additional queries when needed, but it can be achieved using select_related, and can give you better performance, since it reduces the total number of sql queries needed.
Sale.objects.filter(person__name='Jane Janes').select_related('machine', 'person', 'location')
It can be useful to inspect the actual SQL that will be performed when you evalute a queryset. You can do this by accessing the QuerySet.query property.
queryset = Sale.objects.select_related('machine').filter(
person__name='Jim', location__name='London')
print(queryset.query)