SQLAlchemy: Multiple tables and joins, duplicate rows - python

[I feel that this is maybe / for sure (?) a duplicate, however, I've searched all day long to find a solution for this and it seems I can't get it working the way I would like to by myself.]
In MySQL, I've got three tables, named ecordov (A), ecordovadr (B) and ecrgposvk (C).
I've got one key linking all these; in (A) there is one row per key, in (B) and (C) there might be multiple rows per key, so, without being an expert in these questions, I think these are one-to-many relations.
I've read the SQLAlchemy docs and set up my tables like this:
class Ecordov(Base):
__tablename__ = 'ecordov'
oovkey = Column(BIGINT, primary_key=True)
oovorder = Column(BIGINT)
ecordovadr = relationship('Ecordovadr')
ecrgposvk = relationship('Ecrgposvk')
class Ecordovadr(Base):
__tablename__ = 'ecordovadr'
ooakey = Column(BIGINT, primary_key=True)
ooaname1 = Column(VARCHAR)
ooaorder = Column(BIGINT, ForeignKey('ecordov.oovorder'))
class Ecrgposvk(Base):
__tablename__ = 'ecrgposvk'
rgkey = Column(BIGINT, primary_key=True)
rgposvalue = Column(DOUBLE)
rgposordnum = Column(BIGINT, ForeignKey('ecordov.oovorder'))
[So, as you see, the ForeignKeys aren't the primary_key(s), not really sure if this is a problem? However, I can't change the structure of the database.]
My sample query looks like:
jobs = session.query(Ecordov, func.group_concat(Ecordovadr.ooaname1.op('ORDER BY')(text('ecordovadr.ooatype, ecordovadr.ooarank separator "{}"'))).label('ooaname1')).outerjoin(Ecordovadr).filter(Ecordov.oovorder.like('75289')).group_by(Ecordov.oovorder)
gets evaluated to:
SELECT ecordov.oovkey AS ecordov_oovkey, ecordov.oovorder AS ecordov_oovorder, group_concat(ecordovadr.ooaname1 ORDER BY ecordovadr.ooatype, ecordovadr.ooarank separator "{}") AS ooaname1
FROM ecordov LEFT OUTER JOIN ecordovadr ON ecordov.oovorder = ecordovadr.ooaorder
WHERE ecordov.oovorder LIKE '75289'
GROUP BY ecordov.oovorder
and gives me the following:
for x in jobs:
x.ooaname1
u'Sorbe priv.{}Lebensn\xe4he GmbH'
which is my desired outcome.
However, after joining the second table as well, regardless if using an inner- or outerjoin, via, for example, this:
jobs = session.query(Ecordov, func.group_concat(Ecordovadr.ooaname1.op('ORDER BY')(text('ecordovadr.ooatype, ecordovadr.ooarank separator "{}"'))).label('ooaname1')).outerjoin(Ecordovadr).outerjoin(Ecrgposvk).filter(Ecordov.oovorder.like('75289')).group_by(Ecordov.oovorder)
which gets evaluated to:
SELECT ecordov.oovkey AS ecordov_oovkey, ecordov.oovorder AS ecordov_oovorder, group_concat(ecordovadr.ooaname1 ORDER BY ecordovadr.ooatype, ecordovadr.ooarank separator "{}") AS ooaname1
FROM ecordov LEFT OUTER JOIN ecordovadr ON ecordov.oovorder = ecordovadr.ooaorder LEFT OUTER JOIN ecrgposvk ON ecordov.oovorder = ecrgposvk.rgposordnum
WHERE ecordov.oovorder LIKE '75289'
GROUP BY ecordov.oovorder
gives me:
for x in jobs:
x.ooaname1
u'Sorbe priv.{}Sorbe priv.{}Sorbe priv.{}Lebensn\xe4he GmbH{}Lebensn\xe4he GmbH{}Lebensn\xe4he GmbH'
So, the data is tripled now. I've read in other threads about this topic that this is to be expected, especially when using the same ForeignKey for multiple tables.
But I need this data "like before", which means just one entry, instead of three. I've tried using distinct() but without success so far.
Could someone please point me into one direction, how to fix this?
Thanks in advance and all the best!

Related

Filter by field on relationship in SQLAlchemy

I have a very special case in which I want a group entity that have a list with the elements that fit some conditions.
These are the ORM class that I have defined:
class Group(Base):
__tablename__ = 'groups'
id = Column(Integer, Identity(1, 1), primary_key=True)
name = Column(String(50), nullable=False)
elements = relationship('Element', foreign_keys='[Element.group_id]')
class Element(Base):
__tablename__ = 'elemnts'
id = Column(Integer, Identity(1, 1), primary_key=True)
date = Column(Date, nullable=False)
value = Column(Numeric(38, 10), nullable=False)
group_id = Column(Integer, ForeignKey('groups.id'), nullable=False)
Now, I want to retrieve a group with all the elements of a specific date.
result = session.query(Group).filter(Group.name == 'group 1' and Element.date == '2021-05-27').all()
Sadly enough, the Group.name filter is working, but the retrieved group contains all elements, ignoring the Element.date condition.
As suggested by #van, I have tried:
query(Group).join(Element).filter(Group.name == 'group 1' and Element.date == '2021-05-27')
But I get every element again. On the logs I have noticed:
SELECT groups.id AS group_id, groups.name AS groups_name, element_1.id AS element_1_id, element_1.date AS element_1_date, element_1.value AS element_1_value, element_1.group_id AS element_1_group_id
FROM groups JOIN elements ON groups.id = elements.group_id LEFT OUTER JOIN elements AS elements_1 ON groups.id = elements_1.group_id
WHERE groups.name = %(name_1)s
There, I noticed two things. First, the join is being done twice (I guess one was already done just getting groups, before join). Second and most important: the date filter doesn't appear on the query.
The driver I'm using the mssql+pymssql driver.
OK, there seem to be a combination of few things happening here.
First, your relationship Group.elements will basically always contain all Elements of the Group. And this is completely separate from the filter, and this is how SA is supposed to work.
You can understand your current query (session.query(Group).filter(Group.name == 'group 1' and Element.date == '2021-05-27').all()) as the following:
"Return all Group instances which contain an Element for a given date."
But when you iterate over the Group.elements, the SA will make sure to return all children. This is what you are trying to solve.
Second, as pointed out by Yeong, you cannot use simple python and to create an AND SQL clause. Please fix either by using and_ or by just having separate clauses:
result = (
session.query(Group)
.filter(Group.name == "group 1")
.filter(Element.date == dat1)
.all()
)
Third, as you later pointed out, your relationship is lazy="joined" and this is why whenever you query for Group, the related Element instances will ALL be retrieved using OUTER JOIN condition. This is why when adding .join(Element) to your query resulted in two JOINs.
Solution
You can "trick" SA to think that the it loaded all Group.elements relationship when it only loaded the children you want by using orm.contains_eager() option, in which your query would like like this:
result = (
session.query(Group)
.join(Element)
.filter(Group.name == "group 1")
.filter(Element.date == dat1)
.options(contains_eager(Group.elements))
.all()
)
Above should work also with the lazy="joined" as the extra JOIN should not be generated anymore.
Update
If you would like to get the groups even if there are no Elements with the needed criteria, you need do:
replace join with outerjoin
place the filter on Element inside the outerjoin clause
result = (
session.query(Group)
.filter(Group.name == "group 1")
.outerjoin(
Element, and_(Element.group_id == Group.id, Element.date == dat1)
)
.options(contains_eager(Group.elements))
.all()
)
The and in python is not the same as the and condition in SQL. SQLAlchemy has a custom way to handle the conjunction using and_() method instead, i.e.
result = session.query(Group).join(Element).filter(and_(Group.name == 'group 1', Element.date == '2021-05-27')).all()

Can you add conditions to the on clause of a relationship join in SQLAlchemy? [duplicate]

This question already has answers here:
sqlalchemy - join child table with 2 conditions
(3 answers)
Closed 5 years ago.
Say I have the following two SQLAlchemy ORM classes:
import sqlalchemy as sa
class Address(Base):
__tablename__ = 'DimAddress'
AddressKey = sa.Column(sa.Integer, primary_key=True)
# ... columns ...
class DealerOrganisation(Base):
__tablename__ = 'DimDealerOrganisation'
DealerOrganisationKey = sa.Column(sa.Integer, primary_key=True)
# ... columns ...
DealerOrganizationAddressKey = sa.Column(sa.Integer, sa.ForeignKey('DimAddress.AddressKey'), nullable=False)
# ... columns ...
address = relationship('Address')
I can get dealer organizations and their address, if present, as follows:
session = Session()
query = session.query(DealerOrganisation).outerjoin(DealerOrganisation.address).options(contains_eager(DealerOrganisation.address))
This gives me SQL approximately like this:
SELECT *
FROM DimDealerOrganisation
LEFT JOIN DimAddress ON AddressKey = DealerOrganizationAddressKey
But what if I want to do an ORM query for only a subset of related objects:
SELECT *
FROM DimDealerOrganisation
LEFT JOIN DimAddress ON AddressKey = DealerOrganizationAddressKey AND ZipCode = '90210'
That is, I want all the dealers, but I only want their address if the zip code is 90210. As far as I can tell, join() and outerjoin() let you specify either a relationship or an explicit condition, but not both. In this contrived example I could use an explicit condition and get back rows instead of ORM objects, but it would be unwieldy in a real query involving multiple tables and one-to-many relations. I want to add additional conditions to the on clause but still have it populate the address attribute of the returned DealerOrganisation objects. Is this possible?
You are going to want to use the and_ operator from SQLAlchemy in the join. I think it will look something like:
session = Session()
session.query(Table1).join(Table2, and_(Table1.address==Table2.address, Table1.zip == '90210'), isouter=True)

How to perform a natural join on two tables using SQLAlchemy and Flask?

I have two tables Entry and Group defined in Python using Flask SQLAlchemy connected to a PostgresSQL database:
class Entry (db.Model):
__tablename__ = "entry"
id = db.Column('id', db.Integer, primary_key = True)
group_title = db.Column('group_title', db.Unicode, db.ForeignKey('group.group_title'))
url = db.Column('url', db.Unicode)
next_entry_id = db.Column('next_entry_id', db.Integer, db.ForeignKey('entry.id'))
entry = db.relationship('Entry', foreign_keys=next_entry_id)
group = db.relationship('Group', foreign_keys=group_title)
class Group (db.Model):
__tablename__ = "group"
group_title = db.Column('group_title', db.Unicode, primary_key = True)
group_start_id = db.Column('group_start_id', db.Integer)
#etc.
I am trying to combine the two tables with a natural join using the Entry.id and Group.group_start_id as the common field.
I have been able to query a single table for all records. But I want to join tables by foreign key ID to give records relating Group.group_start_id and Group.group_title to a specific Entry record.
I am having trouble with the Flask-SQLAlchemy query syntax or process
I have tried several approaches (to list a few):
db.session.query(Group, Entry.id).all()
db.session.query(Entry, Group).all()
db.session.query.join(Group).first()
db.session.query(Entry, Group).join(Group)
All of them have returned a list of tuples that is bigger than expected and does not contain what I want.
I am looking for the following result:
(Entry.id, group_title, Group.group_start_id, Entry.url)
I would be grateful for any help.
I used the following query to perform a natuaral join for Group and Entry Table:
db.session.query(Entry, Group).join(Group).filter(Group.group_start_id == Entry.id).order_by(Group.order.asc())
I did this using the .join function in my query which allowed me to join the Group table to the Entry table. Then I filtering the results of the query by using the Group.group_start_id which is a foreign key in the Group table which referred to the Entry.id which is the primary key in the Entry table.
Since you have already performed the basic join by using the relationship() call.
We can focus on getting the data you want, a query such as db.session.query(Entry, Group).all() returns tuples of (Entry, Group) type, from this you can easily do something like:
test = db.session.query(Entry, Group).one()
print(test[0].id) #prints entry.id
print(test[1].group_start_id) # prints Group.group_start_id
#...
SQLAlchemy has great article on how joins work

sqlalchemy join and order by on multiple tables

I'm working with a database that has a relationship that looks like:
class Source(Model):
id = Identifier()
class SourceA(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceB(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceC(Source, ServerOptions):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
What I want to do is join all tables Source, SourceA, SourceB, SourceC and then order_by name.
Sound easy to me but I've been banging my head on this for while now and my heads starting to hurt. Also I'm not very familiar with SQL or sqlalchemy so there's been a lot of browsing the docs but to no avail. Maybe I'm just not seeing it. This seems to be close albeit related to a newer version than what I have available (see versions below).
I feel close not that that means anything. Here's my latest attempt which seems good up until the order_by call.
Sources = [SourceA, SourceB, SourceC]
# list of join on Source
joins = [session.query(Source).join(source) for source in Sources]
# union the list of joins
query = joins.pop(0).union_all(*joins)
query seems right at this point as far as I can tell i.e. query.all() works. So now I try to apply order_by which doesn't throw an error until .all is called.
Attempt 1: I just use the attribute I want
query.order_by('name').all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "name" does not exist
Attempt 2: I just use the defined column attribute I want
query.order_by(SourceA.name).all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) missing FROM-clause entry for table "SourceA"
Is it obvious? What am I missing? Thanks!
versions:
sqlalchemy.version = '0.8.1'
(PostgreSQL) 9.1.3
EDIT
I'm dealing with a framework that wants a handle to a query object. I have a bare query that appears to accomplish what I want but I would still need to wrap it in a query object. Not sure if that's possible. Googling ...
select = """
select s.*, a.name from Source d inner join SourceA a on s.id = a.Source_id
union
select s.*, b.name from Source d inner join SourceB b on s.id = b.Source_id
union
select s.*, c.name from Source d inner join SourceC c on s.id = c.Source_id
ORDER BY "name";
"""
selectText = text(select)
result = session.execute(selectText)
# how to put result into a query. maybe Query(selectText)? googling...
result.fetchall():
Assuming that coalesce function is good enough, below examples should point you in the direction. One option automatically creates a list of children, while the other is explicit.
This is not the query you specified in your edit, but you are able to sort (your original request):
def test_explicit():
# specify all children tables to be queried
Sources = [SourceA, SourceB, SourceC]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
def test_implicit():
# get all children tables in the query
from sqlalchemy.orm import class_mapper
_map = class_mapper(Source)
Sources = [_smap.class_
for _smap in _map.self_and_descendants
if _smap != _map # #note: exclude base class, it has no `name`
]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
Your first attempt sounds like it isn't working because there is no name in Source, which is the root table of the query. In addition, there will be multiple name columns after your joins, so you will need to be more specific. Try
query.order_by('SourceA.name').all()
As for your second attempt, what is ServerA?
query.order_by(ServerA.name).all()
Probably a typo, but not sure if it's for SO or your code. Try:
query.order_by(SourceA.name).all()

SQLAlchemy: several counts in one query

I am having hard time optimizing my SQLAlchemy queries. My SQL knowledge is very basic, and I just can't get the stuff I need from the SQLAlchemy docs.
Suppose the following very basic one-to-many relationship:
class Parent(Base):
__tablename__ = "parents"
id = Column(Integer, primary_key = True)
children = relationship("Child", backref = "parent")
class Child(Base):
__tablename__ = "children"
id = Column(Integer, primary_key = True)
parent_id = Column(Integer, ForeignKey("parents.id"))
naughty = Column(Boolean)
How could I:
Query tuples of (Parent, count_of_naughty_children, count_of_all_children) for each parent?
After decent time spent googling, I found how to query those values separately:
# The following returns tuples of (Parent, count_of_all_children):
session.query(Parent, func.count(Child.id)).outerjoin(Child, Parent.children).\
group_by(Parent.id)
# The following returns tuples of (Parent, count_of_naughty_children):
al = aliased(Children, session.query(Children).filter_by(naughty = True).\
subquery())
session.query(Parent, func.count(al.id)).outerjoin(al, Parent.children).\
group_by(Parent.id)
I tried to combine them in different ways, but didn't manage to get what I want.
Query all parents which have more than 80% naughty children? Edit: naughty could be NULL.
I guess this query is going to be based on the previous one, filtering by naughty/all ratio.
Any help is appreciated.
EDIT : Thanks to Antti Haapala's help, I found solution to the second question:
avg = func.avg(func.coalesce(Child.naughty, 0)) # coalesce() treats NULLs as 0
# avg = func.avg(Child.naughty) - if you want to ignore NULLs
session.query(Parent).join(Child, Parent.children).group_by(Parent).\
having(avg > 0.8)
It finds average if children's naughty variable, treating False and NULLs as 0, and True as 1. Tested with MySQL backend, but should work on others, too.
the count() sql aggretate function is pretty simple; it gives you the total number of non-null values in each group. With that in mind, we can adjust your query to give you the proper result.
print (Query([
Parent,
func.count(Child.id),
func.count(case(
[((Child.naughty == True), Child.id)], else_=literal_column("NULL"))).label("naughty")])
.join(Parent.children).group_by(Parent)
)
Which produces the following sql:
SELECT
parents.id AS parents_id,
count(children.id) AS count_1,
count(CASE WHEN (children.naughty = 1)
THEN children.id
ELSE NULL END) AS naughty
FROM parents
JOIN children ON parents.id = children.parent_id
GROUP BY parents.id
If your query is only to get the parents who have > 80 % children naughty, you can on most databases cast the naughty to integer, then take average of it; then having this average greater than 0.8.
Thus you get something like
from sqlalchemy.sql.expression import cast
naughtyp = func.avg(cast(Child.naughty, Integer))
session.query(Parent, func.count(Child.id), naughtyp).join(Child)\
.group_by(Parent.id).having(naughtyp > 0.8).all()

Categories

Resources