Django ORM fails to recognise concrete inheritance in nested ON statement - python

Defining a custom Django user combined with django-taggit I have ran into an ORM issue, I also have this issue in the django admin filters.
NOTE: I am using this snippet: https://djangosnippets.org/snippets/1034/
# User
id | first_name
---------------------------------
1 | John
2 | Jane
# MyUser
usr_ptr_id | subscription
---------------------------------
1 | 'A'
2 | 'B'
Now when I use the django ORM to filter on certain tags for MyUser, e.g.
MyUser.objects.filter(tags__in=tags)
I get the following error:
(1054, "Unknown column 'myapp_user.id' in 'on clause'")
The printed raw query:
SELECT `myproject_user`.`id`, `myproject_user`.`first_name`, `myapp_user`.`user_ptr_id`, `myapp_user`.`subscription`
FROM `myapp_user` INNER JOIN `myproject_user`
ON ( `myapp_user`.`user_ptr_id` = `myproject_user`.`id` )
INNER JOIN `taggit_taggedtag`
ON ( `myapp_user`.`id` = `taggit_taggedtag`.`object_id`
AND (`taggit_taggedtag`.`content_type_id` = 31))
WHERE (`taggit_taggedtag`.`tag_id`)
IN (SELECT `taggit_tag`.`id` FROM `taggit_tag` WHERE `taggit_tag`.`id` IN (1, 3)))
Changing 'id' to 'user_ptr_id' in the second ON part makes the query work is there any way to force this with the Django ORM ?

The issue is that you can't look for an ID in a list of Tags; you need to look for the ID in a list of IDs. To fix this, construct a values_list of all of the IDs you want to filter by, and then pass that list off to your original query instead.
id_list = Tag.objects.all().values_list("id")
MyUser.objects.filter(tags__in=id_list)
If you have a many_to_many rleationship between MyUser and Tag, you can also just use the manytomany manager in place of the whole thing:
MyUser.tags.all()

Related

SQLAchemy: Delete all duplicate rows [duplicate]

I'm using SQLAlchemy to manage a database and I'm trying to delete all rows that contain duplicates. The table has an id (primary key) and domain name.
Example:
ID| Domain
1 | example-1.com
2 | example-2.com
3 | example-1.com
In this case I want to delete 1 instance of example-1.com. Sometimes I will need to delete more than 1 but in general the database should not have a domain more than once and if it does, only the first row should be kept and the others should be deleted.
Assuming your model looks something like this:
import sqlalchemy as sa
class Domain(Base):
__tablename__ = 'domain_names'
id = sa.Column(sa.Integer, primary_key=True)
domain = sa.Column(sa.String)
Then you can delete the duplicates like this:
# Create a query that identifies the row for each domain with the lowest id
inner_q = session.query(sa.func.min(Domain.id)).group_by(Domain.domain)
aliased = sa.alias(inner_q)
# Select the rows that do not match the subquery
q = session.query(Domain).filter(~Domain.id.in_(aliased))
# Delete the unmatched rows (SQLAlchemy generates a single DELETE statement from this loop)
for domain in q:
session.delete(domain)
session.commit()
# Show remaining rows
for domain in session.query(Domain):
print(domain)
print()
If you are not using the ORM, the core equivalent is:
meta = sa.MetaData()
domains = sa.Table('domain_names', meta, autoload=True, autoload_with=engine)
inner_q = sa.select([sa.func.min(domains.c.id)]).group_by(domains.c.domain)
aliased = sa.alias(inner_q)
with engine.connect() as conn:
conn.execute(domains.delete().where(~domains.c.id.in_(aliased)))
This answer is based on the SQL provided in this answer. There are other ways of deleting duplicates, which you can see in the other answers on the link, or by googling "sql delete duplicates" or similar.

Django query with relations

I have a messy and old query that I'm trying to convert from SQL to Django ORM and I can't seem to figure it out.
As the original query is not something that should be public, heres something similair to what I'm working with:
Table 1
id
Table 2
Id
username
active
birthday
table_1_fk
Table 3
Id
amount
table_1_fk
I need to end up with a list of active users (username), sorted by date, displaying the amount. Table1 references within table 2 and 3 are not in order. The main issues I'm having are:
How do I retrieve these with just ORM (no looping/executing, or hardly any if I must)
If I can't use solely ORM and do decide to just loop over the parts I need to, how would I even create a single object to display in a table without looping over everything multiple times?
My tought processes:
Table 2 is active -> get table 1 -> find table 1 pk in table 3 -> add table 3 info to table 1?
Table 1 -> get table 2 Actives, Table1 -> get table 3 amounts -> loop to match according to table1_fks
You can perform related references using the Table1. If your models looks something like this:
from django.db import models
from django.db.models import F
class Table1(models.Model):
...
class Table2(models.Model):
username = models.CharField(max_length=100)
active = models.BooleanField()
birthday = models.DateField() # Sorted by date
table1 = models.ForeignKey(Table1, related_name="table2")
class Table3(models.Model):
amount = models.IntegerField()
table1 = models.ForeignKey(Table1, related_name="table3")
You can do later:
>>> users = (
Table1.objects
.filter(table2__active=True)
.annotate(
username=F("table2__username"),
amount=F("table3__amount"),
birthday=F("table2__birthday")
)
.order_by("-birthday")
.values("username", "amount", "birthday")
)
>>> print(users)
[
["user1", 100.0, "2020-01-13"],
["user2", 890.0, "2020-01-10"],
["user3", None, "2020-01-01"],
]
It completely depends on how your models classes are implemented.

Complex query using Django QuerySets

I am working on a personal project and I am trying to write a complex query that:
Gets every device that belongs to a certain user
Gets every sensor belonging to every one of the user's devices
Gets the last recorded value and timestamp for each of the user's devices sensors.
I am using Sqlite, and I managed to write the query as plain SQL, however, for the life of me I cannot figure out a way to do it in django. I looked at other questions, tried going through the documentation, but to no avail.
My models:
class User(AbstractBaseUser):
email = models.EmailField()
class Device(models.Model):
user = models.ForeignKey(User)
name = models.CharField()
class Unit(models.Model):
name = models.CharField()
class SensorType(models.Model):
name = models.CharField()
unit = models.ForeignKey(Unit)
class Sensor(models.Model):
gpio_port = models.IntegerField()
device = models.ForeignKey(Device)
sensor_type = models.ForeignKey(SensorType)
class SensorData(models.Model):
sensor = models.ForeignKey(Sensor)
value = models.FloatField()
timestamp = models.DateTimeField()
And here is the SQL query:
SELECT acc.email,
dev.name as device_name,
stype.name as sensor_type,
sen.gpio_port as sensor_port,
sdata.value as sensor_latest_value,
unit.name as sensor_units,
sdata.latest as value_received_on
FROM devices_device as dev
INNER JOIN accounts_user as acc on dev.user_id = acc.id
INNER JOIN devices_sensor as sen on sen.device_id = dev.id
INNER JOIN devices_sensortype as stype on stype.id = sen.sensor_type_id
INNER JOIN devices_unit as unit on unit.id = stype.unit_id
LEFT JOIN (
SELECT MAX(sd.timestamp) latest, sd.value, sensor_id
FROM devices_sensordata as sd
INNER JOIN devices_sensor as s ON s.id = sd.sensor_id
GROUP BY sd.sensor_id) as sdata on sdata.sensor_id= sen.id
WHERE acc.id = 1
ORDER BY dev.id
I have been playing with the django shell in order to find a way to implement this query with the QuerySet API, but I cannot figure it out...
The closest I managed to get is with this:
>>> sub = SensorData.objects.values('sensor_id', 'value').filter(sensor_id=OuterRef('pk')).order_by('-timestamp')[:1]
>>> Sensor.objects.annotate(data_id=Subquery(sub.values('sensor_id'))).filter(id=F('data_id')).values(...)
However it has two problems:
It does not include the sensors that do not yet have any values in SensorsData
If i include the SensorData.values field into the .values() I start to get previously recorded values of the sensors
If someone could please show me how to do it, or at least tell me what I am doing wrong I will be very grateful!
Thanks!
P.S. Please excuse my grammar and spelling errors, I am writing this in the middle of the night and I am tired.
EDIT:
Based on the answers I should clarify:
I only want the latest sensor value for each sensor. For example I have In sensordata:
id | sensor_id | value | timestamp|
1 | 1 | 2 | <today> |
2 | 1 | 5 | <yesterday>|
3 | 2 | 3 | <yesterday>|
Only the latest should be returned for each sensor_id:
id | sensor_id | value | timestamp |
1 | 1 | 2 | <today> |
3 | 2 | 3 | <yesterday>|
Or if the sensor does not yet have any data in this table, I waant the query to return a record of it with "null" for value and timestamp (basically the left join in my SQL query).
EDIT2:
Based on #ivissani 's answer, I managed to produce this:
>>> latest_sensor_data = Sensor.objects.annotate(is_latest=~Exists(SensorData.objects.filter(sensor=OuterRef('id'),timestamp__gt=OuterRef('sensordata__timestamp')))).filter(is_latest=True)
>>> user_devices = latest_sensor_data.filter(device__user=1)
>>> for x in user_devices.values_list('device__name','sensor_type__name', 'gpio_port','sensordata__value', 'sensor_type__unit__name', 'sensordata__timestamp').order_by('device__name'):
... print(x)
Which seems to do the job.
This is the SQL it produces:
SELECT
"devices_device"."name",
"devices_sensortype"."name",
"devices_sensor"."gpio_port",
"devices_sensordata"."value",
"devices_unit"."name",
"devices_sensordata"."timestamp"
FROM
"devices_sensor"
LEFT OUTER JOIN "devices_sensordata" ON (
"devices_sensor"."id" = "devices_sensordata"."sensor_id"
)
INNER JOIN "devices_device" ON (
"devices_sensor"."device_id" = "devices_device"."id"
)
INNER JOIN "devices_sensortype" ON (
"devices_sensor"."sensor_type_id" = "devices_sensortype"."id"
)
INNER JOIN "devices_unit" ON (
"devices_sensortype"."unit_id" = "devices_unit"."id"
)
WHERE
(
NOT EXISTS(
SELECT
U0."id",
U0."sensor_id",
U0."value",
U0."timestamp"
FROM
"devices_sensordata" U0
WHERE
(
U0."sensor_id" = ("devices_sensor"."id")
AND U0."timestamp" > ("devices_sensordata"."timestamp")
)
) = True
AND "devices_device"."user_id" = 1
)
ORDER BY
"devices_device"."name" ASC
Actually your query is rather simple, the only complex part is to establish which SensorData is the latest for each Sensor. I would go by using annotations and an Exists subquery in the following way
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True)
Then it's just a matter of filtering this queryset by user in the following way:
certain_user_latest_data = latest_data.filter(sensor__device__user=certain_user)
Now as you want to retrieve the sensors even if they don't have any data this query will not suffice as only SensorData instances are retrieved and the Sensor and Device must be accessed through fields. Unfortunately Django does not allow for explicit joins through its ORM. Therefore I suggest the following (and let me say, this is far from ideal from a performance perspective).
The idea is to annotate the Sensors queryset with the specific values of the latest SensorData (value and timestamp) if any exists in the following way:
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True, sensor=OuterRef('pk'))
sensors_with_value = Sensor.objects.annotate(
latest_value=Subquery(latest_data.values('value')),
latest_value_timestamp=Subquery(latest_data.values('timestamp'))
) # This will generate two subqueries...
certain_user_sensors = sensors_with_value.filter(device__user=certain_user).select_related('device__user')
If there aren't any instances of SensorData for a certain Sensor then the annotated fields latest_value and latest_value_timestamp will simply be set to None.
For this kind of queries, I recommend strongly to use Q objects, here the docs https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects
It's perfectly fine to execute raw queries with django, especially if they are that complex.
If you want to map the results to models use this :
https://docs.djangoproject.com/en/2.2/topics/db/sql/#performing-raw-queries
Otherwise, see this : https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly
Note that in both cases, no checking is done on the query by django.
This means that the security of the query is your full responsability, sanitize the parameters.
Something like this?:
Multiple Devices for 1 User
device_ids = Device.objects.filter(user=user).values_list("id", flat=True)
SensorData.objects.filter(sensor__device__id__in=device_ids
).values("sensor__device__name", "sensor__sensor_type__name",
"value","timestamp").order_by("-timestamp")
1 Device, 1 User
SensorData.objects.filter(sensor__device__user=user
).values("sensor__device__name", "sensor__sensor_type__name",
"value", "timestamp").order_by("-timestamp")
That Queryset will:
1.Gets every device that belongs to a certain user
2.Gets every sensor belonging to every one of the user's devices (but it return sensor_type every sensor cause there is no name field there so i return sensor_type_name)
3.Gets all recorded(order by the latest timestamp) value and timestamp for each of the user's devices sensors.
UPDATE
try this:
list_data=[]
for _id in device_ids:
sensor_data=SensorData.objects.filter(sensor__device__user__id=_id)
if sensor_data.exists():
data=sensor_data.values("sensor__id", "value", "timestamp", "sensor__device__user__id").latest("timestamp")
list_data.append(data)

Getting database object in Django python

The following is my database model:
id | dept | bp | s | o | d | created_by | created_date
dept and bp together have an unique index for the table. This means there will always be 30 different records in the database under the same dept and bp. I was trying to do an update function on the records, by getting the object first. The following is the way I tried to get the object:
try:
Sod_object = Sod.objects.get(dept=dept_name, bp=bp_name)
except Sod.DoesNotExist:
print "Object doesn't exist"
msg = "Sod doesn't exist!"
else:
for s in Sod_object:
# Do something
But it's always giving me 30 records (obviously). How can I make this a single object? Any suggestions?
In Django, every table in the database will automatically have a column called id. That is the default primary key column and unique for every object in a table.
So you can do
Sod_object = Sod.objects.get(id=1)
to fetch the first object in your table.
To update all the 30 objects for your dept-bp combination, you can filter() by those values
sods = Sod.objects.filter(dept=dept_name, bp=bp_name)
and then update each one and save. Django will internally remember the individual records by their id.
for sod in sods:
sod.o = 1
sod.save()

Django ORM: sort by aggregate of filter of related table

Here's a subset of my model:
class Case(models.Model):
... # primary key is named "id"
class Employee(models.Model):
... # primary key is named "id"
class Report(models.Model):
case = ForeignKey(Case, null=True)
employee = ForeignKey(Employee)
date = DateField()
Given a particular employee, I want to produce a list of all cases, ordered by when the employee has most recently reported on it. Those cases for which no report exists should be sorted last. Cases on the same date (including NULL) should be sorted by further criteria.
Can I express this in the Django ORM api? If so, how?
In pseudo-SQL, I think I want
Select Case.*
From Case some-kind-of-join Report
Where report.employee_id = the_given_employee_id
Group by Case.id
Order by Max(Report.date) Desc /* Report-less cases last */, Case.id /* etc. */
Do I need to introduce a many-to-many relation from Case to Employee through Report to do this in Django ORM?
Every relationship in a django model has a reverse relationship that can be easily queried (including when you are ordering) so you can do something like:
Case.objects.all().order_by('-report__date', 'another_field', 'a third field')
but this won't get you any information about a single particular employee. You could do this:
Case.objects.filter(report__employee__pk=5).order_by('-report__date', 'another_field', 'a third field')
but this won't return any Case objects that aren't edited by your particular employee.
So unfortunately, you can't natively do subqueries, so you will have to write a custom annotation query so perform the sub query (i.e. the last order dates for those objects last edited by a particular employee). This is untested, but it's the general idea:
Case \
.objects \
.all() \
.extra(select = {
"employee_last_edit" : """
SELECT app_report.date
FROM app_report
JOIN app_case ON app_case__id = app_report.case_id
WHERE app_report.employee_id = %d
""" % employee.id }) \
.order_by('-employee_last_edit' , 'something_else')

Categories

Resources