I have a messy and old query that I'm trying to convert from SQL to Django ORM and I can't seem to figure it out.
As the original query is not something that should be public, heres something similair to what I'm working with:
Table 1
id
Table 2
Id
username
active
birthday
table_1_fk
Table 3
Id
amount
table_1_fk
I need to end up with a list of active users (username), sorted by date, displaying the amount. Table1 references within table 2 and 3 are not in order. The main issues I'm having are:
How do I retrieve these with just ORM (no looping/executing, or hardly any if I must)
If I can't use solely ORM and do decide to just loop over the parts I need to, how would I even create a single object to display in a table without looping over everything multiple times?
My tought processes:
Table 2 is active -> get table 1 -> find table 1 pk in table 3 -> add table 3 info to table 1?
Table 1 -> get table 2 Actives, Table1 -> get table 3 amounts -> loop to match according to table1_fks
You can perform related references using the Table1. If your models looks something like this:
from django.db import models
from django.db.models import F
class Table1(models.Model):
...
class Table2(models.Model):
username = models.CharField(max_length=100)
active = models.BooleanField()
birthday = models.DateField() # Sorted by date
table1 = models.ForeignKey(Table1, related_name="table2")
class Table3(models.Model):
amount = models.IntegerField()
table1 = models.ForeignKey(Table1, related_name="table3")
You can do later:
>>> users = (
Table1.objects
.filter(table2__active=True)
.annotate(
username=F("table2__username"),
amount=F("table3__amount"),
birthday=F("table2__birthday")
)
.order_by("-birthday")
.values("username", "amount", "birthday")
)
>>> print(users)
[
["user1", 100.0, "2020-01-13"],
["user2", 890.0, "2020-01-10"],
["user3", None, "2020-01-01"],
]
It completely depends on how your models classes are implemented.
Related
I'm using SQLAlchemy to manage a database and I'm trying to delete all rows that contain duplicates. The table has an id (primary key) and domain name.
Example:
ID| Domain
1 | example-1.com
2 | example-2.com
3 | example-1.com
In this case I want to delete 1 instance of example-1.com. Sometimes I will need to delete more than 1 but in general the database should not have a domain more than once and if it does, only the first row should be kept and the others should be deleted.
Assuming your model looks something like this:
import sqlalchemy as sa
class Domain(Base):
__tablename__ = 'domain_names'
id = sa.Column(sa.Integer, primary_key=True)
domain = sa.Column(sa.String)
Then you can delete the duplicates like this:
# Create a query that identifies the row for each domain with the lowest id
inner_q = session.query(sa.func.min(Domain.id)).group_by(Domain.domain)
aliased = sa.alias(inner_q)
# Select the rows that do not match the subquery
q = session.query(Domain).filter(~Domain.id.in_(aliased))
# Delete the unmatched rows (SQLAlchemy generates a single DELETE statement from this loop)
for domain in q:
session.delete(domain)
session.commit()
# Show remaining rows
for domain in session.query(Domain):
print(domain)
print()
If you are not using the ORM, the core equivalent is:
meta = sa.MetaData()
domains = sa.Table('domain_names', meta, autoload=True, autoload_with=engine)
inner_q = sa.select([sa.func.min(domains.c.id)]).group_by(domains.c.domain)
aliased = sa.alias(inner_q)
with engine.connect() as conn:
conn.execute(domains.delete().where(~domains.c.id.in_(aliased)))
This answer is based on the SQL provided in this answer. There are other ways of deleting duplicates, which you can see in the other answers on the link, or by googling "sql delete duplicates" or similar.
I have an annotation like this: which displays the month wise count of a field
bar = Foo.objects.annotate(
item_count=Count('item')
).order_by('-item_month', '-item_year')
and this produces output like this:
html render
I would like to show the change in item_count when compared with the previous month item_count for each month (except the first month). How could I achieve this using annotations or do I need to use pandas?
Thanks
Edit:
In SQL this becomes easy with LAG function, which is similar to
SELECT item_month, item_year, COUNT(item),
LAG(COUNT(item)) OVER (ORDER BY item_month, item_year)
FROM Foo
GROUP BY item_month, item_year
(PS: item_month and item_year are date fields)
Do Django ORM have similar to LAG in SQL?
For these types of Query you need to use Window functions in django Orm
For Lag you can take the help of
https://docs.djangoproject.com/en/4.0/ref/models/database-functions/#lag
Working Query in Orm will look like this :
#models.py
class Review(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='review_user', db_index=True)
review_text = models.TextField(max_length=5000)
rating = models.SmallIntegerField(
validators=[
MaxValueValidator(10),
MinValueValidator(1),
],
)
date_added = models.DateTimeField(db_index=True)
review_id = models.AutoField(primary_key=True, db_index=True)
This is just a dummy table to show you the use case of Lag and Window function in django
Because examples are not available for Lag function on Django Docs.
from django.db.models.functions import Lag, ExtractYear
from django.db.models import F, Window
print(Review.objects.filter().annotate(
num_likes=Count('likereview_review')
).annotate(item_count_lag=Window(expression=Lag(expression=F('num_likes')),order_by=ExtractYear('date_added').asc())).order_by('-num_likes').distinct().query)
Query will look like
SELECT DISTINCT `temp_view_review`.`user_id`, `temp_view_review`.`review_text`, `temp_view_review`.`rating`, `temp_view_review`.`date_added`, `temp_view_review`.`review_id`, COUNT(`temp_view_likereview`.`id`) AS `num_likes`, LAG(COUNT(`temp_view_likereview`.`id`), 1) OVER (ORDER BY EXTRACT(YEAR FROM `temp_view_review`.`date_added`) ASC) AS `item_count_lag` FROM `temp_view_review` LEFT OUTER JOIN `temp_view_likereview` ON (`temp_view_review`.`review_id` = `temp_view_likereview`.`review_id`) GROUP BY `temp_view_review`.`review_id` ORDER BY `num_likes` DESC
Also if you don't want to order_by on extracted year of date then you can use F expressions like this
print(Review.objects.filter().annotate(
num_likes=Count('likereview_review')
).annotate(item_count_lag=Window(expression=Lag(expression=F('num_likes')),order_by=[F('date_added')])).order_by('-num_likes').distinct().query)
Query for this :
SELECT DISTINCT `temp_view_review`.`user_id`, `temp_view_review`.`review_text`, `temp_view_review`.`rating`, `temp_view_review`.`date_added`, `temp_view_review`.`review_id`, COUNT(`temp_view_likereview`.`id`) AS `num_likes`, LAG(COUNT(`temp_view_likereview`.`id`), 1) OVER (ORDER BY `temp_view_review`.`date_added`) AS `item_count_lag` FROM `temp_view_review` LEFT OUTER JOIN `temp_view_likereview` ON (`temp_view_review`.`review_id` = `temp_view_likereview`.`review_id`) GROUP BY `temp_view_review`.`review_id` ORDER BY `num_likes` DESC
There is a many-to-many relationship between A and B.
There are 3 tables to represent that relationship.
TableA, TableB, TableAB
Now I have another table TableC where it has a foreign key to TableA, and I want to filter objects from TableC where it has a relationship with TableB.
The following is high-level code to give you the idea how the models look like.
class A:
value = models.CharField(max_length=255)
class B:
As = models.ManyToManyField('A', related_name='as')
class C:
object_a = models.ForeignKey('A')
The SQL query looks like
SELECT
*
FROM
TABLE_C
WHERE (
SELECT
COUNT(*)
FROM
TABLE_AB
WHERE
TABLEAB.A_id = TABLE_C.A_id
) > 0
I found a solution
C.objects.\
.annotate(num=Count('object_a_As'))\
.filter(num__gt=0)
It runs the following query
SELECT
*, COUNT(TABLE_AB.A_id) as "num"
FROM
TABLE_C
LEFT OUTER JOIN
TABLE_AB
ON
TABLE_C.A_id = TABLE_AB.A_id
GROUP BY
TABLE_C.id
HAVING
COUNT(TABLE_AB.B_id) > 0
I'm trying to figure out how to perform the following SQL query with the Django ORM:
SELECT main.A, main.B, main.C
FROM
(SELECT main.A, MAX(main.B)
FROM main
GROUP BY main.A) subq
WHERE main.A = subq.A
AND main.B = subq.B
The last two lines are necessary because they recover the column C value when B is at a maximum in the group by. Without them, I would have A and the corresponding Max B but not the C value when B is at its max. I have searched extensively but cannot find an example that can construct this query using the Django ORM. Most examples use Django's Subquery class and show how to match the sub-queryset up with one column (so doing main.A = subq.A). But how do I match 2+ columns?
Edit:
Here is the model class:
class Tweets(models.Model):
tweet_id = models.AutoField(primary_key=True)
tweet_date = models.DateTimeField(blank=True)
candidate = models.CharField(max_length=100)
district = models.IntegerField(blank=True)
username = models.CharField(max_length=256)
likes = models.IntegerField(blank=True)
tweet_text = models.CharField(max_length=560)
I'd like to group by "candidate" and "district", then find the tweet with the most likes. But I'd also like to know the "username" and "tweet_text" associated with that tweet that had the most likes.
I am working on a personal project and I am trying to write a complex query that:
Gets every device that belongs to a certain user
Gets every sensor belonging to every one of the user's devices
Gets the last recorded value and timestamp for each of the user's devices sensors.
I am using Sqlite, and I managed to write the query as plain SQL, however, for the life of me I cannot figure out a way to do it in django. I looked at other questions, tried going through the documentation, but to no avail.
My models:
class User(AbstractBaseUser):
email = models.EmailField()
class Device(models.Model):
user = models.ForeignKey(User)
name = models.CharField()
class Unit(models.Model):
name = models.CharField()
class SensorType(models.Model):
name = models.CharField()
unit = models.ForeignKey(Unit)
class Sensor(models.Model):
gpio_port = models.IntegerField()
device = models.ForeignKey(Device)
sensor_type = models.ForeignKey(SensorType)
class SensorData(models.Model):
sensor = models.ForeignKey(Sensor)
value = models.FloatField()
timestamp = models.DateTimeField()
And here is the SQL query:
SELECT acc.email,
dev.name as device_name,
stype.name as sensor_type,
sen.gpio_port as sensor_port,
sdata.value as sensor_latest_value,
unit.name as sensor_units,
sdata.latest as value_received_on
FROM devices_device as dev
INNER JOIN accounts_user as acc on dev.user_id = acc.id
INNER JOIN devices_sensor as sen on sen.device_id = dev.id
INNER JOIN devices_sensortype as stype on stype.id = sen.sensor_type_id
INNER JOIN devices_unit as unit on unit.id = stype.unit_id
LEFT JOIN (
SELECT MAX(sd.timestamp) latest, sd.value, sensor_id
FROM devices_sensordata as sd
INNER JOIN devices_sensor as s ON s.id = sd.sensor_id
GROUP BY sd.sensor_id) as sdata on sdata.sensor_id= sen.id
WHERE acc.id = 1
ORDER BY dev.id
I have been playing with the django shell in order to find a way to implement this query with the QuerySet API, but I cannot figure it out...
The closest I managed to get is with this:
>>> sub = SensorData.objects.values('sensor_id', 'value').filter(sensor_id=OuterRef('pk')).order_by('-timestamp')[:1]
>>> Sensor.objects.annotate(data_id=Subquery(sub.values('sensor_id'))).filter(id=F('data_id')).values(...)
However it has two problems:
It does not include the sensors that do not yet have any values in SensorsData
If i include the SensorData.values field into the .values() I start to get previously recorded values of the sensors
If someone could please show me how to do it, or at least tell me what I am doing wrong I will be very grateful!
Thanks!
P.S. Please excuse my grammar and spelling errors, I am writing this in the middle of the night and I am tired.
EDIT:
Based on the answers I should clarify:
I only want the latest sensor value for each sensor. For example I have In sensordata:
id | sensor_id | value | timestamp|
1 | 1 | 2 | <today> |
2 | 1 | 5 | <yesterday>|
3 | 2 | 3 | <yesterday>|
Only the latest should be returned for each sensor_id:
id | sensor_id | value | timestamp |
1 | 1 | 2 | <today> |
3 | 2 | 3 | <yesterday>|
Or if the sensor does not yet have any data in this table, I waant the query to return a record of it with "null" for value and timestamp (basically the left join in my SQL query).
EDIT2:
Based on #ivissani 's answer, I managed to produce this:
>>> latest_sensor_data = Sensor.objects.annotate(is_latest=~Exists(SensorData.objects.filter(sensor=OuterRef('id'),timestamp__gt=OuterRef('sensordata__timestamp')))).filter(is_latest=True)
>>> user_devices = latest_sensor_data.filter(device__user=1)
>>> for x in user_devices.values_list('device__name','sensor_type__name', 'gpio_port','sensordata__value', 'sensor_type__unit__name', 'sensordata__timestamp').order_by('device__name'):
... print(x)
Which seems to do the job.
This is the SQL it produces:
SELECT
"devices_device"."name",
"devices_sensortype"."name",
"devices_sensor"."gpio_port",
"devices_sensordata"."value",
"devices_unit"."name",
"devices_sensordata"."timestamp"
FROM
"devices_sensor"
LEFT OUTER JOIN "devices_sensordata" ON (
"devices_sensor"."id" = "devices_sensordata"."sensor_id"
)
INNER JOIN "devices_device" ON (
"devices_sensor"."device_id" = "devices_device"."id"
)
INNER JOIN "devices_sensortype" ON (
"devices_sensor"."sensor_type_id" = "devices_sensortype"."id"
)
INNER JOIN "devices_unit" ON (
"devices_sensortype"."unit_id" = "devices_unit"."id"
)
WHERE
(
NOT EXISTS(
SELECT
U0."id",
U0."sensor_id",
U0."value",
U0."timestamp"
FROM
"devices_sensordata" U0
WHERE
(
U0."sensor_id" = ("devices_sensor"."id")
AND U0."timestamp" > ("devices_sensordata"."timestamp")
)
) = True
AND "devices_device"."user_id" = 1
)
ORDER BY
"devices_device"."name" ASC
Actually your query is rather simple, the only complex part is to establish which SensorData is the latest for each Sensor. I would go by using annotations and an Exists subquery in the following way
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True)
Then it's just a matter of filtering this queryset by user in the following way:
certain_user_latest_data = latest_data.filter(sensor__device__user=certain_user)
Now as you want to retrieve the sensors even if they don't have any data this query will not suffice as only SensorData instances are retrieved and the Sensor and Device must be accessed through fields. Unfortunately Django does not allow for explicit joins through its ORM. Therefore I suggest the following (and let me say, this is far from ideal from a performance perspective).
The idea is to annotate the Sensors queryset with the specific values of the latest SensorData (value and timestamp) if any exists in the following way:
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True, sensor=OuterRef('pk'))
sensors_with_value = Sensor.objects.annotate(
latest_value=Subquery(latest_data.values('value')),
latest_value_timestamp=Subquery(latest_data.values('timestamp'))
) # This will generate two subqueries...
certain_user_sensors = sensors_with_value.filter(device__user=certain_user).select_related('device__user')
If there aren't any instances of SensorData for a certain Sensor then the annotated fields latest_value and latest_value_timestamp will simply be set to None.
For this kind of queries, I recommend strongly to use Q objects, here the docs https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects
It's perfectly fine to execute raw queries with django, especially if they are that complex.
If you want to map the results to models use this :
https://docs.djangoproject.com/en/2.2/topics/db/sql/#performing-raw-queries
Otherwise, see this : https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly
Note that in both cases, no checking is done on the query by django.
This means that the security of the query is your full responsability, sanitize the parameters.
Something like this?:
Multiple Devices for 1 User
device_ids = Device.objects.filter(user=user).values_list("id", flat=True)
SensorData.objects.filter(sensor__device__id__in=device_ids
).values("sensor__device__name", "sensor__sensor_type__name",
"value","timestamp").order_by("-timestamp")
1 Device, 1 User
SensorData.objects.filter(sensor__device__user=user
).values("sensor__device__name", "sensor__sensor_type__name",
"value", "timestamp").order_by("-timestamp")
That Queryset will:
1.Gets every device that belongs to a certain user
2.Gets every sensor belonging to every one of the user's devices (but it return sensor_type every sensor cause there is no name field there so i return sensor_type_name)
3.Gets all recorded(order by the latest timestamp) value and timestamp for each of the user's devices sensors.
UPDATE
try this:
list_data=[]
for _id in device_ids:
sensor_data=SensorData.objects.filter(sensor__device__user__id=_id)
if sensor_data.exists():
data=sensor_data.values("sensor__id", "value", "timestamp", "sensor__device__user__id").latest("timestamp")
list_data.append(data)