Fetching most recent related object for set of objects in Peewee - python

Suppose I have an object model A with a one-to-many relationship with B in Peewee using an sqlite backend. I want to fetch some set of A and join each with their most recent B. Is their a way to do this without looping?
class A(Model):
some_field = CharField()
class B(Model):
a = ForeignKeyField(A)
date = DateTimeField(default=datetime.datetime.now)
The naive way would be to call order_by and limit(1), but that would apply to the entire query, so
q = A.select().join(B).order_by(B.date.desc()).limit(1)
will naturally produce a singleton result, as will
q = B.select().order_by(B.date.desc()).limit(1).join(A)
I am either using prefetch wrong or it doesn't work for this, because
q1 = A.select()
q2 = B.select().order_by(B.date.desc()).limit(1)
q3 = prefetch(q1,q2)
len(q3[0].a_set)
len(q3[0].a_set_prefetch)
Neither of those sets has length 1, as desired. Does anyone know how to do this?

I realize I needed to understand functions and group_by.
q = B.select().join(A).group_by(A).having(fn.Max(B.date)==B.date)

You can use it this way only if you want the latest date and not the last entry of the date. If the last date entry isn't the default one (datetime.datetime.now) this query will be wrong.
You can find the last date entry:
last_entry_date = B.select(B.date).order_by(B.id.desc()).limit(1).scalar()
and the related A records with this date:
with A and B fields:
q = A.select(A, B).join(B).where(B.date == last_entry_date)
with only the A fields:
q = B.select().join(A).where(B.date == last_entry_date)
If you want to find the latest B.date (as you do with the fn.Max(B.date)) and use it as the where filter:
latest_date = B.select(B.date).order_by(B.date.desc()).limit(1).scalar()

Related

Find duplicates based on grandparent-instance id and filter out older duplicates based on timestamp field

I’m trying to find duplicates of a Django model-object's instance based on grandparent-instance id and filter out older duplicates based on timestamp field.
I suppose I could do this with distinct(*specify_fields) function, but I don’t use Postgresql database (docs). I managed to achieve this with the following code:
queryset = MyModel.objects.filter(some_filtering…) \
.only('parent_id__grandparent_id', 'timestamp' 'regular_fields'...) \
.values('parent_id__grandparent_id', 'timestamp' 'regular_fields'...)
# compare_all_combinations_and_remove_duplicates_with_older_timestamps
list_of_dicts = list(queryset)
for a, b in itertools.combinations(list_of_dicts, 2):
if a['parent_id__grandparent_id']: == b['parent_id__grandparent_id']:
if a['timestamp'] > b['timestamp']:
list_of_dicts.remove(b)
else:
list_of_dicts.remove(a)
However, this feels hacky and I guess this is not an optimal solution. Is there a better way (by better I mean more optimal, i.e. minimizing the number of times querysets are evaluated etc.)? Can I do the same with queryset’s methods?
My models look something like this:
class MyModel(models.Model):
parent_id = models.ForeignKey('Parent'…
timestamp = …
regular_fields = …
class Parent(models.Model):
grandparent_id = models.ForeignKey('Grandparent'…
class Grandparent(models.Model):
…

Complex query using Django QuerySets

I am working on a personal project and I am trying to write a complex query that:
Gets every device that belongs to a certain user
Gets every sensor belonging to every one of the user's devices
Gets the last recorded value and timestamp for each of the user's devices sensors.
I am using Sqlite, and I managed to write the query as plain SQL, however, for the life of me I cannot figure out a way to do it in django. I looked at other questions, tried going through the documentation, but to no avail.
My models:
class User(AbstractBaseUser):
email = models.EmailField()
class Device(models.Model):
user = models.ForeignKey(User)
name = models.CharField()
class Unit(models.Model):
name = models.CharField()
class SensorType(models.Model):
name = models.CharField()
unit = models.ForeignKey(Unit)
class Sensor(models.Model):
gpio_port = models.IntegerField()
device = models.ForeignKey(Device)
sensor_type = models.ForeignKey(SensorType)
class SensorData(models.Model):
sensor = models.ForeignKey(Sensor)
value = models.FloatField()
timestamp = models.DateTimeField()
And here is the SQL query:
SELECT acc.email,
dev.name as device_name,
stype.name as sensor_type,
sen.gpio_port as sensor_port,
sdata.value as sensor_latest_value,
unit.name as sensor_units,
sdata.latest as value_received_on
FROM devices_device as dev
INNER JOIN accounts_user as acc on dev.user_id = acc.id
INNER JOIN devices_sensor as sen on sen.device_id = dev.id
INNER JOIN devices_sensortype as stype on stype.id = sen.sensor_type_id
INNER JOIN devices_unit as unit on unit.id = stype.unit_id
LEFT JOIN (
SELECT MAX(sd.timestamp) latest, sd.value, sensor_id
FROM devices_sensordata as sd
INNER JOIN devices_sensor as s ON s.id = sd.sensor_id
GROUP BY sd.sensor_id) as sdata on sdata.sensor_id= sen.id
WHERE acc.id = 1
ORDER BY dev.id
I have been playing with the django shell in order to find a way to implement this query with the QuerySet API, but I cannot figure it out...
The closest I managed to get is with this:
>>> sub = SensorData.objects.values('sensor_id', 'value').filter(sensor_id=OuterRef('pk')).order_by('-timestamp')[:1]
>>> Sensor.objects.annotate(data_id=Subquery(sub.values('sensor_id'))).filter(id=F('data_id')).values(...)
However it has two problems:
It does not include the sensors that do not yet have any values in SensorsData
If i include the SensorData.values field into the .values() I start to get previously recorded values of the sensors
If someone could please show me how to do it, or at least tell me what I am doing wrong I will be very grateful!
Thanks!
P.S. Please excuse my grammar and spelling errors, I am writing this in the middle of the night and I am tired.
EDIT:
Based on the answers I should clarify:
I only want the latest sensor value for each sensor. For example I have In sensordata:
id | sensor_id | value | timestamp|
1 | 1 | 2 | <today> |
2 | 1 | 5 | <yesterday>|
3 | 2 | 3 | <yesterday>|
Only the latest should be returned for each sensor_id:
id | sensor_id | value | timestamp |
1 | 1 | 2 | <today> |
3 | 2 | 3 | <yesterday>|
Or if the sensor does not yet have any data in this table, I waant the query to return a record of it with "null" for value and timestamp (basically the left join in my SQL query).
EDIT2:
Based on #ivissani 's answer, I managed to produce this:
>>> latest_sensor_data = Sensor.objects.annotate(is_latest=~Exists(SensorData.objects.filter(sensor=OuterRef('id'),timestamp__gt=OuterRef('sensordata__timestamp')))).filter(is_latest=True)
>>> user_devices = latest_sensor_data.filter(device__user=1)
>>> for x in user_devices.values_list('device__name','sensor_type__name', 'gpio_port','sensordata__value', 'sensor_type__unit__name', 'sensordata__timestamp').order_by('device__name'):
... print(x)
Which seems to do the job.
This is the SQL it produces:
SELECT
"devices_device"."name",
"devices_sensortype"."name",
"devices_sensor"."gpio_port",
"devices_sensordata"."value",
"devices_unit"."name",
"devices_sensordata"."timestamp"
FROM
"devices_sensor"
LEFT OUTER JOIN "devices_sensordata" ON (
"devices_sensor"."id" = "devices_sensordata"."sensor_id"
)
INNER JOIN "devices_device" ON (
"devices_sensor"."device_id" = "devices_device"."id"
)
INNER JOIN "devices_sensortype" ON (
"devices_sensor"."sensor_type_id" = "devices_sensortype"."id"
)
INNER JOIN "devices_unit" ON (
"devices_sensortype"."unit_id" = "devices_unit"."id"
)
WHERE
(
NOT EXISTS(
SELECT
U0."id",
U0."sensor_id",
U0."value",
U0."timestamp"
FROM
"devices_sensordata" U0
WHERE
(
U0."sensor_id" = ("devices_sensor"."id")
AND U0."timestamp" > ("devices_sensordata"."timestamp")
)
) = True
AND "devices_device"."user_id" = 1
)
ORDER BY
"devices_device"."name" ASC
Actually your query is rather simple, the only complex part is to establish which SensorData is the latest for each Sensor. I would go by using annotations and an Exists subquery in the following way
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True)
Then it's just a matter of filtering this queryset by user in the following way:
certain_user_latest_data = latest_data.filter(sensor__device__user=certain_user)
Now as you want to retrieve the sensors even if they don't have any data this query will not suffice as only SensorData instances are retrieved and the Sensor and Device must be accessed through fields. Unfortunately Django does not allow for explicit joins through its ORM. Therefore I suggest the following (and let me say, this is far from ideal from a performance perspective).
The idea is to annotate the Sensors queryset with the specific values of the latest SensorData (value and timestamp) if any exists in the following way:
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True, sensor=OuterRef('pk'))
sensors_with_value = Sensor.objects.annotate(
latest_value=Subquery(latest_data.values('value')),
latest_value_timestamp=Subquery(latest_data.values('timestamp'))
) # This will generate two subqueries...
certain_user_sensors = sensors_with_value.filter(device__user=certain_user).select_related('device__user')
If there aren't any instances of SensorData for a certain Sensor then the annotated fields latest_value and latest_value_timestamp will simply be set to None.
For this kind of queries, I recommend strongly to use Q objects, here the docs https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects
It's perfectly fine to execute raw queries with django, especially if they are that complex.
If you want to map the results to models use this :
https://docs.djangoproject.com/en/2.2/topics/db/sql/#performing-raw-queries
Otherwise, see this : https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly
Note that in both cases, no checking is done on the query by django.
This means that the security of the query is your full responsability, sanitize the parameters.
Something like this?:
Multiple Devices for 1 User
device_ids = Device.objects.filter(user=user).values_list("id", flat=True)
SensorData.objects.filter(sensor__device__id__in=device_ids
).values("sensor__device__name", "sensor__sensor_type__name",
"value","timestamp").order_by("-timestamp")
1 Device, 1 User
SensorData.objects.filter(sensor__device__user=user
).values("sensor__device__name", "sensor__sensor_type__name",
"value", "timestamp").order_by("-timestamp")
That Queryset will:
1.Gets every device that belongs to a certain user
2.Gets every sensor belonging to every one of the user's devices (but it return sensor_type every sensor cause there is no name field there so i return sensor_type_name)
3.Gets all recorded(order by the latest timestamp) value and timestamp for each of the user's devices sensors.
UPDATE
try this:
list_data=[]
for _id in device_ids:
sensor_data=SensorData.objects.filter(sensor__device__user__id=_id)
if sensor_data.exists():
data=sensor_data.values("sensor__id", "value", "timestamp", "sensor__device__user__id").latest("timestamp")
list_data.append(data)

How to combine a custom date and time field in Django?

I'm looking for a way to combine a custom date value and a time field in django. My model only contains a time field. Now I have to annotate a new field combining a custom date and the time field. I thought the following code will solve my problem, but it only gives the date value. TimeField is ignored.
class MyModel(models.Model):
my_time_field = TimeField()
custom_date = datetime.today().date()
objects = MyModel.objects.annotate(
custom_datetime=Func(
custom_date + F('my_time_field'),
function='DATE'
)
)
Please advise the right way to solve this issue.
You should be able to use a Value expression (see the docs) in a manner similar to this:
class MyModel(models.Model):
my_time_field = TimeField()
custom_date = datetime.today().date()
MyModel.objects.annotate(
custom_datetime=Value(
datetime.datetime.combine(custom_date, F('my_time_field')),
output_field=DateTimeField()))
The key parts are to combine your custom_date, the time from your my_time_field, and then output it as a DateTimeField.
It was an easy solution, but took a while for me to figure it out. If anyone else having the same question, this is the answer. Just use ExpressionWrapper.
objects = MyModel.objects.annotate(
custom_datetime=ExpressionWrapper(
custom_date + F('my_time_field'),
output_field=DateTimeField()
)
)

use inner join on three table and aggregation in a single query

I have three models,
class A(models.Model):
code=models.CharField(max_length=9,unique=True)
class B(models.Model):
submitted_by = models.ForeignKey(D)
a = models.OneToOneField(A)
name = models.CharField(max_length=70,blank=True,default='')
class C(models.Model):
status = models.PositiveSmallIntegerField()
status_time = models.DateTimeField(auto_now_add=True)
a = models.ForeignKey(A)
I need to query such that i can get code(from model A), name(from model B) and status time (from model C) and status(from model C) where submitted_by_id=1 and status should be maximum for each id.
The sql is :
SELECT A.code ,B.name,C.status,C.status_time FROM `A` INNER JOIN `B` on A.id=B.a_id INNER JOIN `C` on A.id=C.a_id where B.submitted_by_id=1 and C.status_time=(se lect max(C.status_time) from `C` pipeline where C.a_id=A.id)
if any one can help me with the django ORM.
I am not able to understand how can i use inner joins,aggregation and subquery together in a single query.
EDIT:
B.objects.filter(submitted_by_id=1).values('name','a__code','a__c__status_time','a__c__status').order_by('-a__c__status').first()
I tried this query.But it return only one row i.e. row with max status.
Can we modify this and return the result for each id.
You don't use Django ORM like SQL. If I understand you correctly in Django ORM your query will look like:
result = []
bs = B.objects.filter(submitted_by_id=1)
for b in bs:
a = b.a
c = C.objects.filter(a=a).order_by('-status').first()
result.append([a.code, b.name, c.status, c.status_time])
Maybe the follow can help:
results = []
b_set = B.objects.filter(submitted_by__id=1)
for b in b_set:
c = C.objects.filter(a__b=b).order_by('-status_time').first()
results.append([c.a.code, c.a.b.name, c.status, c.status_time])
You can probably use annotation and select_related() to do what you want.
I have not tested this at all, but here's what I would try:
from django.db.models import F
from django.db.models import Max
annotated_c = C.objects.annotate(last_status_time=Max('a__cs__status_time'))
last_c = annotated_c.filter(status_time=F('last_status_time'),
a__b__submitted_by_id=1)
for c in last_c.select_related('a', 'a__b'):
print c.a.code, c.a.b.name, c.status_time
You might need to add related_name to the foreign keys.
Set the django.db logger to DEBUG level to see exactly what SQL you are generating.

sqlalchemy join and order by on multiple tables

I'm working with a database that has a relationship that looks like:
class Source(Model):
id = Identifier()
class SourceA(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceB(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceC(Source, ServerOptions):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
What I want to do is join all tables Source, SourceA, SourceB, SourceC and then order_by name.
Sound easy to me but I've been banging my head on this for while now and my heads starting to hurt. Also I'm not very familiar with SQL or sqlalchemy so there's been a lot of browsing the docs but to no avail. Maybe I'm just not seeing it. This seems to be close albeit related to a newer version than what I have available (see versions below).
I feel close not that that means anything. Here's my latest attempt which seems good up until the order_by call.
Sources = [SourceA, SourceB, SourceC]
# list of join on Source
joins = [session.query(Source).join(source) for source in Sources]
# union the list of joins
query = joins.pop(0).union_all(*joins)
query seems right at this point as far as I can tell i.e. query.all() works. So now I try to apply order_by which doesn't throw an error until .all is called.
Attempt 1: I just use the attribute I want
query.order_by('name').all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "name" does not exist
Attempt 2: I just use the defined column attribute I want
query.order_by(SourceA.name).all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) missing FROM-clause entry for table "SourceA"
Is it obvious? What am I missing? Thanks!
versions:
sqlalchemy.version = '0.8.1'
(PostgreSQL) 9.1.3
EDIT
I'm dealing with a framework that wants a handle to a query object. I have a bare query that appears to accomplish what I want but I would still need to wrap it in a query object. Not sure if that's possible. Googling ...
select = """
select s.*, a.name from Source d inner join SourceA a on s.id = a.Source_id
union
select s.*, b.name from Source d inner join SourceB b on s.id = b.Source_id
union
select s.*, c.name from Source d inner join SourceC c on s.id = c.Source_id
ORDER BY "name";
"""
selectText = text(select)
result = session.execute(selectText)
# how to put result into a query. maybe Query(selectText)? googling...
result.fetchall():
Assuming that coalesce function is good enough, below examples should point you in the direction. One option automatically creates a list of children, while the other is explicit.
This is not the query you specified in your edit, but you are able to sort (your original request):
def test_explicit():
# specify all children tables to be queried
Sources = [SourceA, SourceB, SourceC]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
def test_implicit():
# get all children tables in the query
from sqlalchemy.orm import class_mapper
_map = class_mapper(Source)
Sources = [_smap.class_
for _smap in _map.self_and_descendants
if _smap != _map # #note: exclude base class, it has no `name`
]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
Your first attempt sounds like it isn't working because there is no name in Source, which is the root table of the query. In addition, there will be multiple name columns after your joins, so you will need to be more specific. Try
query.order_by('SourceA.name').all()
As for your second attempt, what is ServerA?
query.order_by(ServerA.name).all()
Probably a typo, but not sure if it's for SO or your code. Try:
query.order_by(SourceA.name).all()

Categories

Resources