sqlalchemy join and order by on multiple tables - python

I'm working with a database that has a relationship that looks like:
class Source(Model):
id = Identifier()
class SourceA(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceB(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceC(Source, ServerOptions):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
What I want to do is join all tables Source, SourceA, SourceB, SourceC and then order_by name.
Sound easy to me but I've been banging my head on this for while now and my heads starting to hurt. Also I'm not very familiar with SQL or sqlalchemy so there's been a lot of browsing the docs but to no avail. Maybe I'm just not seeing it. This seems to be close albeit related to a newer version than what I have available (see versions below).
I feel close not that that means anything. Here's my latest attempt which seems good up until the order_by call.
Sources = [SourceA, SourceB, SourceC]
# list of join on Source
joins = [session.query(Source).join(source) for source in Sources]
# union the list of joins
query = joins.pop(0).union_all(*joins)
query seems right at this point as far as I can tell i.e. query.all() works. So now I try to apply order_by which doesn't throw an error until .all is called.
Attempt 1: I just use the attribute I want
query.order_by('name').all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "name" does not exist
Attempt 2: I just use the defined column attribute I want
query.order_by(SourceA.name).all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) missing FROM-clause entry for table "SourceA"
Is it obvious? What am I missing? Thanks!
versions:
sqlalchemy.version = '0.8.1'
(PostgreSQL) 9.1.3
EDIT
I'm dealing with a framework that wants a handle to a query object. I have a bare query that appears to accomplish what I want but I would still need to wrap it in a query object. Not sure if that's possible. Googling ...
select = """
select s.*, a.name from Source d inner join SourceA a on s.id = a.Source_id
union
select s.*, b.name from Source d inner join SourceB b on s.id = b.Source_id
union
select s.*, c.name from Source d inner join SourceC c on s.id = c.Source_id
ORDER BY "name";
"""
selectText = text(select)
result = session.execute(selectText)
# how to put result into a query. maybe Query(selectText)? googling...
result.fetchall():

Assuming that coalesce function is good enough, below examples should point you in the direction. One option automatically creates a list of children, while the other is explicit.
This is not the query you specified in your edit, but you are able to sort (your original request):
def test_explicit():
# specify all children tables to be queried
Sources = [SourceA, SourceB, SourceC]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
def test_implicit():
# get all children tables in the query
from sqlalchemy.orm import class_mapper
_map = class_mapper(Source)
Sources = [_smap.class_
for _smap in _map.self_and_descendants
if _smap != _map # #note: exclude base class, it has no `name`
]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)

Your first attempt sounds like it isn't working because there is no name in Source, which is the root table of the query. In addition, there will be multiple name columns after your joins, so you will need to be more specific. Try
query.order_by('SourceA.name').all()
As for your second attempt, what is ServerA?
query.order_by(ServerA.name).all()
Probably a typo, but not sure if it's for SO or your code. Try:
query.order_by(SourceA.name).all()

Related

SQLAchemy: Delete all duplicate rows [duplicate]

I'm using SQLAlchemy to manage a database and I'm trying to delete all rows that contain duplicates. The table has an id (primary key) and domain name.
Example:
ID| Domain
1 | example-1.com
2 | example-2.com
3 | example-1.com
In this case I want to delete 1 instance of example-1.com. Sometimes I will need to delete more than 1 but in general the database should not have a domain more than once and if it does, only the first row should be kept and the others should be deleted.
Assuming your model looks something like this:
import sqlalchemy as sa
class Domain(Base):
__tablename__ = 'domain_names'
id = sa.Column(sa.Integer, primary_key=True)
domain = sa.Column(sa.String)
Then you can delete the duplicates like this:
# Create a query that identifies the row for each domain with the lowest id
inner_q = session.query(sa.func.min(Domain.id)).group_by(Domain.domain)
aliased = sa.alias(inner_q)
# Select the rows that do not match the subquery
q = session.query(Domain).filter(~Domain.id.in_(aliased))
# Delete the unmatched rows (SQLAlchemy generates a single DELETE statement from this loop)
for domain in q:
session.delete(domain)
session.commit()
# Show remaining rows
for domain in session.query(Domain):
print(domain)
print()
If you are not using the ORM, the core equivalent is:
meta = sa.MetaData()
domains = sa.Table('domain_names', meta, autoload=True, autoload_with=engine)
inner_q = sa.select([sa.func.min(domains.c.id)]).group_by(domains.c.domain)
aliased = sa.alias(inner_q)
with engine.connect() as conn:
conn.execute(domains.delete().where(~domains.c.id.in_(aliased)))
This answer is based on the SQL provided in this answer. There are other ways of deleting duplicates, which you can see in the other answers on the link, or by googling "sql delete duplicates" or similar.

Confusing SQLAlchemy conversion of simple subquery

I've been wrestling with what should be a simple conversion of a straightforward SQL query into an SQLAlchemy expression, and I just cannot get things to line up the way I mean in the subquery. This is a single-table query of a "Comments" table; I want to find which users have made the most first comments:
SELECT user_id, count(*) AS count
FROM comments c
where c.date = (SELECT MIN(c2.date)
FROM comments c2
WHERE c2.post_id = c.post_id
)
GROUP BY user_id
ORDER BY count DESC
LIMIT 20;
I don't know how to write the subquery so that it refers to the outer query, and if I did, I wouldn't know how to assemble this into the outer query itself. (Using MySQL, which shouldn't matter.)
Well, after giving up for a while and then looking back at it, I came up with something that works. I'm sure there's a better way, but:
c2 = aliased(Comment)
firstdate = select([func.min(c2.date)]).\
where(c2.post_id == Comment.post_id).\
as_scalar() # or scalar_subquery(), in SQLA 1.4
users = session.query(
Comment.user_id, func.count('*').label('count')).\
filter(Comment.date == firstdate).\
group_by(Comment.user_id).\
order_by(desc('count')).\
limit(20)

How to perform a natural join on two tables using SQLAlchemy and Flask?

I have two tables Entry and Group defined in Python using Flask SQLAlchemy connected to a PostgresSQL database:
class Entry (db.Model):
__tablename__ = "entry"
id = db.Column('id', db.Integer, primary_key = True)
group_title = db.Column('group_title', db.Unicode, db.ForeignKey('group.group_title'))
url = db.Column('url', db.Unicode)
next_entry_id = db.Column('next_entry_id', db.Integer, db.ForeignKey('entry.id'))
entry = db.relationship('Entry', foreign_keys=next_entry_id)
group = db.relationship('Group', foreign_keys=group_title)
class Group (db.Model):
__tablename__ = "group"
group_title = db.Column('group_title', db.Unicode, primary_key = True)
group_start_id = db.Column('group_start_id', db.Integer)
#etc.
I am trying to combine the two tables with a natural join using the Entry.id and Group.group_start_id as the common field.
I have been able to query a single table for all records. But I want to join tables by foreign key ID to give records relating Group.group_start_id and Group.group_title to a specific Entry record.
I am having trouble with the Flask-SQLAlchemy query syntax or process
I have tried several approaches (to list a few):
db.session.query(Group, Entry.id).all()
db.session.query(Entry, Group).all()
db.session.query.join(Group).first()
db.session.query(Entry, Group).join(Group)
All of them have returned a list of tuples that is bigger than expected and does not contain what I want.
I am looking for the following result:
(Entry.id, group_title, Group.group_start_id, Entry.url)
I would be grateful for any help.
I used the following query to perform a natuaral join for Group and Entry Table:
db.session.query(Entry, Group).join(Group).filter(Group.group_start_id == Entry.id).order_by(Group.order.asc())
I did this using the .join function in my query which allowed me to join the Group table to the Entry table. Then I filtering the results of the query by using the Group.group_start_id which is a foreign key in the Group table which referred to the Entry.id which is the primary key in the Entry table.
Since you have already performed the basic join by using the relationship() call.
We can focus on getting the data you want, a query such as db.session.query(Entry, Group).all() returns tuples of (Entry, Group) type, from this you can easily do something like:
test = db.session.query(Entry, Group).one()
print(test[0].id) #prints entry.id
print(test[1].group_start_id) # prints Group.group_start_id
#...
SQLAlchemy has great article on how joins work

SQLAlchemy: Multiple tables and joins, duplicate rows

[I feel that this is maybe / for sure (?) a duplicate, however, I've searched all day long to find a solution for this and it seems I can't get it working the way I would like to by myself.]
In MySQL, I've got three tables, named ecordov (A), ecordovadr (B) and ecrgposvk (C).
I've got one key linking all these; in (A) there is one row per key, in (B) and (C) there might be multiple rows per key, so, without being an expert in these questions, I think these are one-to-many relations.
I've read the SQLAlchemy docs and set up my tables like this:
class Ecordov(Base):
__tablename__ = 'ecordov'
oovkey = Column(BIGINT, primary_key=True)
oovorder = Column(BIGINT)
ecordovadr = relationship('Ecordovadr')
ecrgposvk = relationship('Ecrgposvk')
class Ecordovadr(Base):
__tablename__ = 'ecordovadr'
ooakey = Column(BIGINT, primary_key=True)
ooaname1 = Column(VARCHAR)
ooaorder = Column(BIGINT, ForeignKey('ecordov.oovorder'))
class Ecrgposvk(Base):
__tablename__ = 'ecrgposvk'
rgkey = Column(BIGINT, primary_key=True)
rgposvalue = Column(DOUBLE)
rgposordnum = Column(BIGINT, ForeignKey('ecordov.oovorder'))
[So, as you see, the ForeignKeys aren't the primary_key(s), not really sure if this is a problem? However, I can't change the structure of the database.]
My sample query looks like:
jobs = session.query(Ecordov, func.group_concat(Ecordovadr.ooaname1.op('ORDER BY')(text('ecordovadr.ooatype, ecordovadr.ooarank separator "{}"'))).label('ooaname1')).outerjoin(Ecordovadr).filter(Ecordov.oovorder.like('75289')).group_by(Ecordov.oovorder)
gets evaluated to:
SELECT ecordov.oovkey AS ecordov_oovkey, ecordov.oovorder AS ecordov_oovorder, group_concat(ecordovadr.ooaname1 ORDER BY ecordovadr.ooatype, ecordovadr.ooarank separator "{}") AS ooaname1
FROM ecordov LEFT OUTER JOIN ecordovadr ON ecordov.oovorder = ecordovadr.ooaorder
WHERE ecordov.oovorder LIKE '75289'
GROUP BY ecordov.oovorder
and gives me the following:
for x in jobs:
x.ooaname1
u'Sorbe priv.{}Lebensn\xe4he GmbH'
which is my desired outcome.
However, after joining the second table as well, regardless if using an inner- or outerjoin, via, for example, this:
jobs = session.query(Ecordov, func.group_concat(Ecordovadr.ooaname1.op('ORDER BY')(text('ecordovadr.ooatype, ecordovadr.ooarank separator "{}"'))).label('ooaname1')).outerjoin(Ecordovadr).outerjoin(Ecrgposvk).filter(Ecordov.oovorder.like('75289')).group_by(Ecordov.oovorder)
which gets evaluated to:
SELECT ecordov.oovkey AS ecordov_oovkey, ecordov.oovorder AS ecordov_oovorder, group_concat(ecordovadr.ooaname1 ORDER BY ecordovadr.ooatype, ecordovadr.ooarank separator "{}") AS ooaname1
FROM ecordov LEFT OUTER JOIN ecordovadr ON ecordov.oovorder = ecordovadr.ooaorder LEFT OUTER JOIN ecrgposvk ON ecordov.oovorder = ecrgposvk.rgposordnum
WHERE ecordov.oovorder LIKE '75289'
GROUP BY ecordov.oovorder
gives me:
for x in jobs:
x.ooaname1
u'Sorbe priv.{}Sorbe priv.{}Sorbe priv.{}Lebensn\xe4he GmbH{}Lebensn\xe4he GmbH{}Lebensn\xe4he GmbH'
So, the data is tripled now. I've read in other threads about this topic that this is to be expected, especially when using the same ForeignKey for multiple tables.
But I need this data "like before", which means just one entry, instead of three. I've tried using distinct() but without success so far.
Could someone please point me into one direction, how to fix this?
Thanks in advance and all the best!

SQLAlchemy ORDER BY DESCENDING?

How can I use ORDER BY descending in a SQLAlchemy query like the following?
This query works, but returns them in ascending order:
query = (model.Session.query(model.Entry)
.join(model.ClassificationItem)
.join(model.EnumerationValue)
.filter_by(id=c.row.id)
.order_by(model.Entry.amount) # This row :)
)
If I try:
.order_by(desc(model.Entry.amount))
then I get: NameError: global name 'desc' is not defined.
Just as an FYI, you can also specify those things as column attributes. For instance, I might have done:
.order_by(model.Entry.amount.desc())
This is handy since it avoids an import, and you can use it on other places such as in a relation definition, etc.
For more information, you can refer this SQLAlchemy 1.4 Documentation
from sqlalchemy import desc
someselect.order_by(desc(table1.mycol))
Usage from #jpmc26
One other thing you might do is:
.order_by("name desc")
This will result in: ORDER BY name desc. The disadvantage here is the explicit column name used in order by.
You can use .desc() function in your query just like this
query = (model.Session.query(model.Entry)
.join(model.ClassificationItem)
.join(model.EnumerationValue)
.filter_by(id=c.row.id)
.order_by(model.Entry.amount.desc())
)
This will order by amount in descending order
or
query = session.query(
model.Entry
).join(
model.ClassificationItem
).join(
model.EnumerationValue
).filter_by(
id=c.row.id
).order_by(
model.Entry.amount.desc()
)
)
Use of desc function of SQLAlchemy
from sqlalchemy import desc
query = session.query(
model.Entry
).join(
model.ClassificationItem
).join(
model.EnumerationValue
).filter_by(
id=c.row.id
).order_by(
desc(model.Entry.amount)
)
)
For official docs please use the link or check below snippet
sqlalchemy.sql.expression.desc(column) Produce a descending ORDER BY
clause element.
e.g.:
from sqlalchemy import desc
stmt = select([users_table]).order_by(desc(users_table.c.name))
will produce SQL as:
SELECT id, name FROM user ORDER BY name DESC
The desc() function is a standalone version of the
ColumnElement.desc() method available on all SQL expressions, e.g.:
stmt = select([users_table]).order_by(users_table.c.name.desc())
Parameters column – A ColumnElement (e.g. scalar SQL expression) with
which to apply the desc() operation.
See also
asc()
nullsfirst()
nullslast()
Select.order_by()
You can try: .order_by(ClientTotal.id.desc())
session = Session()
auth_client_name = 'client3'
result_by_auth_client = session.query(ClientTotal).filter(ClientTotal.client ==
auth_client_name).order_by(ClientTotal.id.desc()).all()
for rbac in result_by_auth_client:
print(rbac.id)
session.close()
Complementary at #Radu answer, As in SQL, you can add the table name in the parameter if you have many table with the same attribute.
.order_by("TableName.name desc")

Categories

Resources