Complex query using Django QuerySets - python

I am working on a personal project and I am trying to write a complex query that:
Gets every device that belongs to a certain user
Gets every sensor belonging to every one of the user's devices
Gets the last recorded value and timestamp for each of the user's devices sensors.
I am using Sqlite, and I managed to write the query as plain SQL, however, for the life of me I cannot figure out a way to do it in django. I looked at other questions, tried going through the documentation, but to no avail.
My models:
class User(AbstractBaseUser):
email = models.EmailField()
class Device(models.Model):
user = models.ForeignKey(User)
name = models.CharField()
class Unit(models.Model):
name = models.CharField()
class SensorType(models.Model):
name = models.CharField()
unit = models.ForeignKey(Unit)
class Sensor(models.Model):
gpio_port = models.IntegerField()
device = models.ForeignKey(Device)
sensor_type = models.ForeignKey(SensorType)
class SensorData(models.Model):
sensor = models.ForeignKey(Sensor)
value = models.FloatField()
timestamp = models.DateTimeField()
And here is the SQL query:
SELECT acc.email,
dev.name as device_name,
stype.name as sensor_type,
sen.gpio_port as sensor_port,
sdata.value as sensor_latest_value,
unit.name as sensor_units,
sdata.latest as value_received_on
FROM devices_device as dev
INNER JOIN accounts_user as acc on dev.user_id = acc.id
INNER JOIN devices_sensor as sen on sen.device_id = dev.id
INNER JOIN devices_sensortype as stype on stype.id = sen.sensor_type_id
INNER JOIN devices_unit as unit on unit.id = stype.unit_id
LEFT JOIN (
SELECT MAX(sd.timestamp) latest, sd.value, sensor_id
FROM devices_sensordata as sd
INNER JOIN devices_sensor as s ON s.id = sd.sensor_id
GROUP BY sd.sensor_id) as sdata on sdata.sensor_id= sen.id
WHERE acc.id = 1
ORDER BY dev.id
I have been playing with the django shell in order to find a way to implement this query with the QuerySet API, but I cannot figure it out...
The closest I managed to get is with this:
>>> sub = SensorData.objects.values('sensor_id', 'value').filter(sensor_id=OuterRef('pk')).order_by('-timestamp')[:1]
>>> Sensor.objects.annotate(data_id=Subquery(sub.values('sensor_id'))).filter(id=F('data_id')).values(...)
However it has two problems:
It does not include the sensors that do not yet have any values in SensorsData
If i include the SensorData.values field into the .values() I start to get previously recorded values of the sensors
If someone could please show me how to do it, or at least tell me what I am doing wrong I will be very grateful!
Thanks!
P.S. Please excuse my grammar and spelling errors, I am writing this in the middle of the night and I am tired.
EDIT:
Based on the answers I should clarify:
I only want the latest sensor value for each sensor. For example I have In sensordata:
id | sensor_id | value | timestamp|
1 | 1 | 2 | <today> |
2 | 1 | 5 | <yesterday>|
3 | 2 | 3 | <yesterday>|
Only the latest should be returned for each sensor_id:
id | sensor_id | value | timestamp |
1 | 1 | 2 | <today> |
3 | 2 | 3 | <yesterday>|
Or if the sensor does not yet have any data in this table, I waant the query to return a record of it with "null" for value and timestamp (basically the left join in my SQL query).
EDIT2:
Based on #ivissani 's answer, I managed to produce this:
>>> latest_sensor_data = Sensor.objects.annotate(is_latest=~Exists(SensorData.objects.filter(sensor=OuterRef('id'),timestamp__gt=OuterRef('sensordata__timestamp')))).filter(is_latest=True)
>>> user_devices = latest_sensor_data.filter(device__user=1)
>>> for x in user_devices.values_list('device__name','sensor_type__name', 'gpio_port','sensordata__value', 'sensor_type__unit__name', 'sensordata__timestamp').order_by('device__name'):
... print(x)
Which seems to do the job.
This is the SQL it produces:
SELECT
"devices_device"."name",
"devices_sensortype"."name",
"devices_sensor"."gpio_port",
"devices_sensordata"."value",
"devices_unit"."name",
"devices_sensordata"."timestamp"
FROM
"devices_sensor"
LEFT OUTER JOIN "devices_sensordata" ON (
"devices_sensor"."id" = "devices_sensordata"."sensor_id"
)
INNER JOIN "devices_device" ON (
"devices_sensor"."device_id" = "devices_device"."id"
)
INNER JOIN "devices_sensortype" ON (
"devices_sensor"."sensor_type_id" = "devices_sensortype"."id"
)
INNER JOIN "devices_unit" ON (
"devices_sensortype"."unit_id" = "devices_unit"."id"
)
WHERE
(
NOT EXISTS(
SELECT
U0."id",
U0."sensor_id",
U0."value",
U0."timestamp"
FROM
"devices_sensordata" U0
WHERE
(
U0."sensor_id" = ("devices_sensor"."id")
AND U0."timestamp" > ("devices_sensordata"."timestamp")
)
) = True
AND "devices_device"."user_id" = 1
)
ORDER BY
"devices_device"."name" ASC

Actually your query is rather simple, the only complex part is to establish which SensorData is the latest for each Sensor. I would go by using annotations and an Exists subquery in the following way
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True)
Then it's just a matter of filtering this queryset by user in the following way:
certain_user_latest_data = latest_data.filter(sensor__device__user=certain_user)
Now as you want to retrieve the sensors even if they don't have any data this query will not suffice as only SensorData instances are retrieved and the Sensor and Device must be accessed through fields. Unfortunately Django does not allow for explicit joins through its ORM. Therefore I suggest the following (and let me say, this is far from ideal from a performance perspective).
The idea is to annotate the Sensors queryset with the specific values of the latest SensorData (value and timestamp) if any exists in the following way:
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True, sensor=OuterRef('pk'))
sensors_with_value = Sensor.objects.annotate(
latest_value=Subquery(latest_data.values('value')),
latest_value_timestamp=Subquery(latest_data.values('timestamp'))
) # This will generate two subqueries...
certain_user_sensors = sensors_with_value.filter(device__user=certain_user).select_related('device__user')
If there aren't any instances of SensorData for a certain Sensor then the annotated fields latest_value and latest_value_timestamp will simply be set to None.

For this kind of queries, I recommend strongly to use Q objects, here the docs https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects

It's perfectly fine to execute raw queries with django, especially if they are that complex.
If you want to map the results to models use this :
https://docs.djangoproject.com/en/2.2/topics/db/sql/#performing-raw-queries
Otherwise, see this : https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly
Note that in both cases, no checking is done on the query by django.
This means that the security of the query is your full responsability, sanitize the parameters.

Something like this?:
Multiple Devices for 1 User
device_ids = Device.objects.filter(user=user).values_list("id", flat=True)
SensorData.objects.filter(sensor__device__id__in=device_ids
).values("sensor__device__name", "sensor__sensor_type__name",
"value","timestamp").order_by("-timestamp")
1 Device, 1 User
SensorData.objects.filter(sensor__device__user=user
).values("sensor__device__name", "sensor__sensor_type__name",
"value", "timestamp").order_by("-timestamp")
That Queryset will:
1.Gets every device that belongs to a certain user
2.Gets every sensor belonging to every one of the user's devices (but it return sensor_type every sensor cause there is no name field there so i return sensor_type_name)
3.Gets all recorded(order by the latest timestamp) value and timestamp for each of the user's devices sensors.
UPDATE
try this:
list_data=[]
for _id in device_ids:
sensor_data=SensorData.objects.filter(sensor__device__user__id=_id)
if sensor_data.exists():
data=sensor_data.values("sensor__id", "value", "timestamp", "sensor__device__user__id").latest("timestamp")
list_data.append(data)

Related

How to implement cross join in django for a count annotation

I present a simplified version of my problem. I have venues and timeslots and users and bookings, as shown in the model descriptions below. Time slots are universal for all venues, and users can book into a time slot at a venue up until the venue capacity is reached.
class Venue(models.Model):
name = models.Charfield(max_length=200)
capacity = models.PositiveIntegerField(default=0)
class TimeSlot(models.Model):
start_time = models.TimeField()
end_time = models.TimeField()
class Booking(models.Model):
user = models.ForeignKey(User)
time_slot = models.ForeignKey(TimeSlot)
venue = models.ForeignKey(Venue)
Now I would like to as efficiently as possible get all possible combinations of Venues and TimeSlots and annotate the count of the bookings made for each combination, including the case where the number of bookings is 0.
I have managed to achieve this in raw SQL using a cross join on the Venue and TimeSlot tables. Something to the effect of the below. However despite exhaustive searching have not been able to find a django equivalent.
SELECT venue.name, timeslot.start_time, timeslot.end_time, count(booking.id)
FROM myapp_venue as venue
CROSS JOIN myapp_timeslot as timeslot
LEFT JOIN myapp_booking as booking on booking.time_slot_id = timeslot.id
GROUP BY venue.name, timeslot.start_time, timeslot.end_time
I'm also able to annotate the query to retrieve the count of bookings for which bookings for that combination do exist. But those combinations with 0 bookings get excluded. Example:
qs = Booking.objects.all().values(
venue=F('venue__name'),
start_time=F('time_slot__start_time'),
end_time=F('time_slot__end_time')
).annotate(bookings=Count('id')) \
.order_by('venue', 'start_time', 'end_time')
How can I achieve the effect of the CROSS JOIN query using the django ORM?
I don't believe Django has the capability to do cross joins without reverting down to raw SQL. I can give you two ideas that could point you in the right direction though:
Combination of queries and python loops.
venues = Venue.objects.all()
time_slots = TimeSlot.objects.all()
qs = ** your customer query above **
# Loop through both querysets, to create a master list.
venue_time_slots = []
for venue in venues:
for time_slot in time_slots:
venue_time_slots.append(venue.name, time_slot.start_time, time_slot.end_time, 0)
# Loop through master list and then compare to custom qs to update the count.
for venue_time in venue_time_slots:
for vt in qs:
# Check if venue and time found.
if venue_time[0] == qs.venue and venue_time[1] == qs.start_time:
venue_time[3] += qs.bookings
break
The harder one which I don't have a solution is to use a combination of filter, exclude, and union. I only have used this with 3 tables (two parents with a child-link-table), where you have 4 including user. So I can only provide the logic and not an example.
# Get all results that exist in table using .filter().
first_query.filter()
# Get all results that do not exist by using .exclude().
# You can use your results from the first query to exclude also, but
# would need to create an interim list.
exclude_ids = [fq_row.id for fq_row in first_query]
second_query.exclude(id__in=exclude_ids)
# Combine both queries
query = first_query.union(second_query)
return query

Django query with relations

I have a messy and old query that I'm trying to convert from SQL to Django ORM and I can't seem to figure it out.
As the original query is not something that should be public, heres something similair to what I'm working with:
Table 1
id
Table 2
Id
username
active
birthday
table_1_fk
Table 3
Id
amount
table_1_fk
I need to end up with a list of active users (username), sorted by date, displaying the amount. Table1 references within table 2 and 3 are not in order. The main issues I'm having are:
How do I retrieve these with just ORM (no looping/executing, or hardly any if I must)
If I can't use solely ORM and do decide to just loop over the parts I need to, how would I even create a single object to display in a table without looping over everything multiple times?
My tought processes:
Table 2 is active -> get table 1 -> find table 1 pk in table 3 -> add table 3 info to table 1?
Table 1 -> get table 2 Actives, Table1 -> get table 3 amounts -> loop to match according to table1_fks
You can perform related references using the Table1. If your models looks something like this:
from django.db import models
from django.db.models import F
class Table1(models.Model):
...
class Table2(models.Model):
username = models.CharField(max_length=100)
active = models.BooleanField()
birthday = models.DateField() # Sorted by date
table1 = models.ForeignKey(Table1, related_name="table2")
class Table3(models.Model):
amount = models.IntegerField()
table1 = models.ForeignKey(Table1, related_name="table3")
You can do later:
>>> users = (
Table1.objects
.filter(table2__active=True)
.annotate(
username=F("table2__username"),
amount=F("table3__amount"),
birthday=F("table2__birthday")
)
.order_by("-birthday")
.values("username", "amount", "birthday")
)
>>> print(users)
[
["user1", 100.0, "2020-01-13"],
["user2", 890.0, "2020-01-10"],
["user3", None, "2020-01-01"],
]
It completely depends on how your models classes are implemented.

Getting database object in Django python

The following is my database model:
id | dept | bp | s | o | d | created_by | created_date
dept and bp together have an unique index for the table. This means there will always be 30 different records in the database under the same dept and bp. I was trying to do an update function on the records, by getting the object first. The following is the way I tried to get the object:
try:
Sod_object = Sod.objects.get(dept=dept_name, bp=bp_name)
except Sod.DoesNotExist:
print "Object doesn't exist"
msg = "Sod doesn't exist!"
else:
for s in Sod_object:
# Do something
But it's always giving me 30 records (obviously). How can I make this a single object? Any suggestions?
In Django, every table in the database will automatically have a column called id. That is the default primary key column and unique for every object in a table.
So you can do
Sod_object = Sod.objects.get(id=1)
to fetch the first object in your table.
To update all the 30 objects for your dept-bp combination, you can filter() by those values
sods = Sod.objects.filter(dept=dept_name, bp=bp_name)
and then update each one and save. Django will internally remember the individual records by their id.
for sod in sods:
sod.o = 1
sod.save()

Django ORM fails to recognise concrete inheritance in nested ON statement

Defining a custom Django user combined with django-taggit I have ran into an ORM issue, I also have this issue in the django admin filters.
NOTE: I am using this snippet: https://djangosnippets.org/snippets/1034/
# User
id | first_name
---------------------------------
1 | John
2 | Jane
# MyUser
usr_ptr_id | subscription
---------------------------------
1 | 'A'
2 | 'B'
Now when I use the django ORM to filter on certain tags for MyUser, e.g.
MyUser.objects.filter(tags__in=tags)
I get the following error:
(1054, "Unknown column 'myapp_user.id' in 'on clause'")
The printed raw query:
SELECT `myproject_user`.`id`, `myproject_user`.`first_name`, `myapp_user`.`user_ptr_id`, `myapp_user`.`subscription`
FROM `myapp_user` INNER JOIN `myproject_user`
ON ( `myapp_user`.`user_ptr_id` = `myproject_user`.`id` )
INNER JOIN `taggit_taggedtag`
ON ( `myapp_user`.`id` = `taggit_taggedtag`.`object_id`
AND (`taggit_taggedtag`.`content_type_id` = 31))
WHERE (`taggit_taggedtag`.`tag_id`)
IN (SELECT `taggit_tag`.`id` FROM `taggit_tag` WHERE `taggit_tag`.`id` IN (1, 3)))
Changing 'id' to 'user_ptr_id' in the second ON part makes the query work is there any way to force this with the Django ORM ?
The issue is that you can't look for an ID in a list of Tags; you need to look for the ID in a list of IDs. To fix this, construct a values_list of all of the IDs you want to filter by, and then pass that list off to your original query instead.
id_list = Tag.objects.all().values_list("id")
MyUser.objects.filter(tags__in=id_list)
If you have a many_to_many rleationship between MyUser and Tag, you can also just use the manytomany manager in place of the whole thing:
MyUser.tags.all()

Grouping by week, and padding out 'missing' weeks

In my Django model, I've got a very simple model which represents a single occurrence of an event (such as a server alert occurring):
class EventOccurrence:
event = models.ForeignKey(Event)
time = models.DateTimeField()
My end goal is to produce a table or graph that shows how many times an event occurred over the past n weeks.
So my question has two parts:
How can I group_by the week of the time field?
How can I "pad out" the result of this group_by to add a zero-value for any missing weeks?
For example, for the second part, I'd like transform a result like this:
| week | count | | week | count |
| 2 | 3 | | 2 | 3 |
| 3 | 5 | —— becomes —> | 3 | 5 |
| 5 | 1 | | 4 | 0 |
| 5 | 1 |
What's the best way to do this in Django? General Python solutions are also OK.
Django's DateField as well as datetime doesn't support week attribute. To fetch everything in one query you need to do:
from django.db import connection
cursor = connection.cursor()
cursor.execute(" SELECT WEEK(`time`) AS 'week', COUNT(*) AS 'count' FROM %s GROUP BY WEEK(`time`) ORDER BY WEEK(`time`)" % EventOccurrence._meta.db_table, [])
data = []
results = cursor.fetchall()
for i, row in enumerate(results[:-1]):
data.append(row)
week = row[0] + 1
next_week = results[i+1][0]
while week < next_week:
data.append( (week, 0) )
week += 1
data.append( results[-1] )
print data
After digging django query api doc, I have don't found a way to make query through django ORM system. Cursor is a workaround, if your database brand is MySQL:
from django.db import connection, transaction
cursor = connection.cursor()
cursor.execute("""
select
week(time) as `week`,
count(*) as `count`
from EventOccurrence
group by week(time)
order by 1;""")
myData = dictfetchall(cursor)
This is, in my opinion, the best performance solution. But notice that this don't pad missing weeks.
EDITED Indepedent database brand solution via python (less performance)
If you are looking for database brand independece code then you should take dates day by day and aggregate it via python. If this is your case code may looks like:
#get all weeks:
import datetime
weeks = set()
d7 = datetime.timedelta( days = 7)
iterDay = datetime.date(2012,1,1)
while iterDay <= datetime.now():
weeks.add( iterDay.isocalendar()[1] )
iterDay += d7
#get all events
allEvents = EventOccurrence.objects.value_list( 'time', flat=True )
#aggregate event by week
result = dict()
for w in weeks:
result.setdefault( w ,0)
for e in allEvents:
result[ e.isocalendar()[1] ] += 1
(Disclaimer: not tested)
Since I have to query multiple tables by join them, I'm using db view to solve these requirements.
CREATE VIEW my_view
AS
SELECT
*, // <-- other fields goes here
YEAR(time_field) as year,
WEEK(time_field) as week
FROM my_table;
and the model as:
from django.db import models
class MyView(models.Model):
# other fields goes here
year = models.IntegerField()
week = models.IntegerField()
class Meta:
managed = False
db_table = 'my_view'
def query():
rows = MyView.objects.filter(week__range=[2, 5])
# to handle the rows
after get rows from this db view, use the way by #danihp to padding 0 for "hole" weeks/months.
NOTE: this is only tested for MySQL backend, I'm not sure if it's OK for MS SQL Server or other.

Categories

Resources