use inner join on three table and aggregation in a single query - python

I have three models,
class A(models.Model):
code=models.CharField(max_length=9,unique=True)
class B(models.Model):
submitted_by = models.ForeignKey(D)
a = models.OneToOneField(A)
name = models.CharField(max_length=70,blank=True,default='')
class C(models.Model):
status = models.PositiveSmallIntegerField()
status_time = models.DateTimeField(auto_now_add=True)
a = models.ForeignKey(A)
I need to query such that i can get code(from model A), name(from model B) and status time (from model C) and status(from model C) where submitted_by_id=1 and status should be maximum for each id.
The sql is :
SELECT A.code ,B.name,C.status,C.status_time FROM `A` INNER JOIN `B` on A.id=B.a_id INNER JOIN `C` on A.id=C.a_id where B.submitted_by_id=1 and C.status_time=(se lect max(C.status_time) from `C` pipeline where C.a_id=A.id)
if any one can help me with the django ORM.
I am not able to understand how can i use inner joins,aggregation and subquery together in a single query.
EDIT:
B.objects.filter(submitted_by_id=1).values('name','a__code','a__c__status_time','a__c__status').order_by('-a__c__status').first()
I tried this query.But it return only one row i.e. row with max status.
Can we modify this and return the result for each id.

You don't use Django ORM like SQL. If I understand you correctly in Django ORM your query will look like:
result = []
bs = B.objects.filter(submitted_by_id=1)
for b in bs:
a = b.a
c = C.objects.filter(a=a).order_by('-status').first()
result.append([a.code, b.name, c.status, c.status_time])

Maybe the follow can help:
results = []
b_set = B.objects.filter(submitted_by__id=1)
for b in b_set:
c = C.objects.filter(a__b=b).order_by('-status_time').first()
results.append([c.a.code, c.a.b.name, c.status, c.status_time])

You can probably use annotation and select_related() to do what you want.
I have not tested this at all, but here's what I would try:
from django.db.models import F
from django.db.models import Max
annotated_c = C.objects.annotate(last_status_time=Max('a__cs__status_time'))
last_c = annotated_c.filter(status_time=F('last_status_time'),
a__b__submitted_by_id=1)
for c in last_c.select_related('a', 'a__b'):
print c.a.code, c.a.b.name, c.status_time
You might need to add related_name to the foreign keys.
Set the django.db logger to DEBUG level to see exactly what SQL you are generating.

Related

How to compose Django Model Filter for relation existence?

There is a many-to-many relationship between A and B.
There are 3 tables to represent that relationship.
TableA, TableB, TableAB
Now I have another table TableC where it has a foreign key to TableA, and I want to filter objects from TableC where it has a relationship with TableB.
The following is high-level code to give you the idea how the models look like.
class A:
value = models.CharField(max_length=255)
class B:
As = models.ManyToManyField('A', related_name='as')
class C:
object_a = models.ForeignKey('A')
The SQL query looks like
SELECT
*
FROM
TABLE_C
WHERE (
SELECT
COUNT(*)
FROM
TABLE_AB
WHERE
TABLEAB.A_id = TABLE_C.A_id
) > 0
I found a solution
C.objects.\
.annotate(num=Count('object_a_As'))\
.filter(num__gt=0)
It runs the following query
SELECT
*, COUNT(TABLE_AB.A_id) as "num"
FROM
TABLE_C
LEFT OUTER JOIN
TABLE_AB
ON
TABLE_C.A_id = TABLE_AB.A_id
GROUP BY
TABLE_C.id
HAVING
COUNT(TABLE_AB.B_id) > 0

Find duplicates based on grandparent-instance id and filter out older duplicates based on timestamp field

I’m trying to find duplicates of a Django model-object's instance based on grandparent-instance id and filter out older duplicates based on timestamp field.
I suppose I could do this with distinct(*specify_fields) function, but I don’t use Postgresql database (docs). I managed to achieve this with the following code:
queryset = MyModel.objects.filter(some_filtering…) \
.only('parent_id__grandparent_id', 'timestamp' 'regular_fields'...) \
.values('parent_id__grandparent_id', 'timestamp' 'regular_fields'...)
# compare_all_combinations_and_remove_duplicates_with_older_timestamps
list_of_dicts = list(queryset)
for a, b in itertools.combinations(list_of_dicts, 2):
if a['parent_id__grandparent_id']: == b['parent_id__grandparent_id']:
if a['timestamp'] > b['timestamp']:
list_of_dicts.remove(b)
else:
list_of_dicts.remove(a)
However, this feels hacky and I guess this is not an optimal solution. Is there a better way (by better I mean more optimal, i.e. minimizing the number of times querysets are evaluated etc.)? Can I do the same with queryset’s methods?
My models look something like this:
class MyModel(models.Model):
parent_id = models.ForeignKey('Parent'…
timestamp = …
regular_fields = …
class Parent(models.Model):
grandparent_id = models.ForeignKey('Grandparent'…
class Grandparent(models.Model):
…

Fetching most recent related object for set of objects in Peewee

Suppose I have an object model A with a one-to-many relationship with B in Peewee using an sqlite backend. I want to fetch some set of A and join each with their most recent B. Is their a way to do this without looping?
class A(Model):
some_field = CharField()
class B(Model):
a = ForeignKeyField(A)
date = DateTimeField(default=datetime.datetime.now)
The naive way would be to call order_by and limit(1), but that would apply to the entire query, so
q = A.select().join(B).order_by(B.date.desc()).limit(1)
will naturally produce a singleton result, as will
q = B.select().order_by(B.date.desc()).limit(1).join(A)
I am either using prefetch wrong or it doesn't work for this, because
q1 = A.select()
q2 = B.select().order_by(B.date.desc()).limit(1)
q3 = prefetch(q1,q2)
len(q3[0].a_set)
len(q3[0].a_set_prefetch)
Neither of those sets has length 1, as desired. Does anyone know how to do this?
I realize I needed to understand functions and group_by.
q = B.select().join(A).group_by(A).having(fn.Max(B.date)==B.date)
You can use it this way only if you want the latest date and not the last entry of the date. If the last date entry isn't the default one (datetime.datetime.now) this query will be wrong.
You can find the last date entry:
last_entry_date = B.select(B.date).order_by(B.id.desc()).limit(1).scalar()
and the related A records with this date:
with A and B fields:
q = A.select(A, B).join(B).where(B.date == last_entry_date)
with only the A fields:
q = B.select().join(A).where(B.date == last_entry_date)
If you want to find the latest B.date (as you do with the fn.Max(B.date)) and use it as the where filter:
latest_date = B.select(B.date).order_by(B.date.desc()).limit(1).scalar()

SQLAlchemy, is there any way to set filter on relationship condition

I have ORM models.
class A:
id = ....
type = 'dynamic' or None
class B:
id =...
rels = relationship(A)
Is there any way to set option on rels relationship, which allows me after session.query(B).get(id).rels get already filtered data.
This is described in the Specifying Alternate Join Conditions section in the docs:
rels = relationship(A, primaryjoin=and_(B.a_id == A.id, A.type == "dynamic"))
The caveat is that this filtering is global and not configurable per-query.

sqlalchemy join and order by on multiple tables

I'm working with a database that has a relationship that looks like:
class Source(Model):
id = Identifier()
class SourceA(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceB(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceC(Source, ServerOptions):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
What I want to do is join all tables Source, SourceA, SourceB, SourceC and then order_by name.
Sound easy to me but I've been banging my head on this for while now and my heads starting to hurt. Also I'm not very familiar with SQL or sqlalchemy so there's been a lot of browsing the docs but to no avail. Maybe I'm just not seeing it. This seems to be close albeit related to a newer version than what I have available (see versions below).
I feel close not that that means anything. Here's my latest attempt which seems good up until the order_by call.
Sources = [SourceA, SourceB, SourceC]
# list of join on Source
joins = [session.query(Source).join(source) for source in Sources]
# union the list of joins
query = joins.pop(0).union_all(*joins)
query seems right at this point as far as I can tell i.e. query.all() works. So now I try to apply order_by which doesn't throw an error until .all is called.
Attempt 1: I just use the attribute I want
query.order_by('name').all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "name" does not exist
Attempt 2: I just use the defined column attribute I want
query.order_by(SourceA.name).all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) missing FROM-clause entry for table "SourceA"
Is it obvious? What am I missing? Thanks!
versions:
sqlalchemy.version = '0.8.1'
(PostgreSQL) 9.1.3
EDIT
I'm dealing with a framework that wants a handle to a query object. I have a bare query that appears to accomplish what I want but I would still need to wrap it in a query object. Not sure if that's possible. Googling ...
select = """
select s.*, a.name from Source d inner join SourceA a on s.id = a.Source_id
union
select s.*, b.name from Source d inner join SourceB b on s.id = b.Source_id
union
select s.*, c.name from Source d inner join SourceC c on s.id = c.Source_id
ORDER BY "name";
"""
selectText = text(select)
result = session.execute(selectText)
# how to put result into a query. maybe Query(selectText)? googling...
result.fetchall():
Assuming that coalesce function is good enough, below examples should point you in the direction. One option automatically creates a list of children, while the other is explicit.
This is not the query you specified in your edit, but you are able to sort (your original request):
def test_explicit():
# specify all children tables to be queried
Sources = [SourceA, SourceB, SourceC]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
def test_implicit():
# get all children tables in the query
from sqlalchemy.orm import class_mapper
_map = class_mapper(Source)
Sources = [_smap.class_
for _smap in _map.self_and_descendants
if _smap != _map # #note: exclude base class, it has no `name`
]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
Your first attempt sounds like it isn't working because there is no name in Source, which is the root table of the query. In addition, there will be multiple name columns after your joins, so you will need to be more specific. Try
query.order_by('SourceA.name').all()
As for your second attempt, what is ServerA?
query.order_by(ServerA.name).all()
Probably a typo, but not sure if it's for SO or your code. Try:
query.order_by(SourceA.name).all()

Categories

Resources