sqlalchemy FULL OUTER JOIN - python

How to implement FULL OUTER JOIN in sqlalchemy on orm level.
Here my code:
q1 = (db.session.query(
tb1.user_id.label('u_id'),
func.count(tb1.id).label('tb1_c')
)
.group_by(tb1.user_id)
)
q2 = (db.session.query(
tb2.user_id.label('u_id'),
func.count(tb2.id).label('tb2_c')
)
.group_by(tb2.user_id)
)
above two queries and I want to apply FULL OUTER JOIN on them.

Since 1.1. sqlalchemy now fully supports FULL OUTER JOINS. See here: https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.join.params.full
So for your code you would want to do:
q1 = (db.session.query(
tb1.user_id.label('u_id'),
func.count(tb1.id).label('tb1_c')
)
.group_by(tb1.user_id)
).cte('q1')
q2 = (db.session.query(
tb2.user_id.label('u_id'),
func.count(tb2.id).label('tb2_c')
)
.group_by(tb2.user_id)
).cte('q2')
result = db.session.query(
func.coalesce(q1.u_id, q2.u_id).label('u_id'),
q1.tb1_c,
q2.tb2_c
).join(
q2,
q1.u_id == q2.u_id,
full=True
)
Note that as with any FULL OUTER JOIN, tb1_c and tb2_c may be null so you might want to apply a coalesce on them.

First of all, sqlalchemy does not support FULL JOIN out of the box, and for some good reasons. So any solution proposed will consist of two parts:
a work-around for missing functionality
sqlalchemy syntax to build a query for that work-around
Now, for the reasons to avoid the FULL JOIN, please read some old blog Better Alternatives to a FULL OUTER JOIN.
From this very blog I will take the idea of how to avoid FULL JOIN by adding 0 values to the missing columns and aggregating (SUM) on UNION ALL intead. SA code might look something like below:
q1 = (session.query(
tb1.user_id.label('u_id'),
func.count(tb1.id).label('tb1_c'),
literal(0).label('tb2_c'), # #NOTE: added 0
).group_by(tb1.user_id))
q2 = (session.query(
tb2.user_id.label('u_id'),
literal(0).label('tb1_c'), # #NOTE: added 0
func.count(tb2.id).label('tb2_c')
).group_by(tb2.user_id))
qt = union_all(q1, q2).alias("united")
qr = select([qt.c.u_id, func.sum(qt.c.tb1_c), func.sum(qt.c.tb2_c)]).group_by(qt.c.u_id)
Having composed the query above, I actually might consider other options:
simply execute those two queries separately and aggregate the results already in Python itself (for not so large results sets)
given that it looks like some kind of reporting functionality rather than business model workflow, create a SQL query and execute it directly via engine. (only if it really is much better performing though)

Related

MYSQL PeeWee Full Join without RawQuery

I am using PeeWee with MySQL. I have two tables that need a full join to keep records from both left and right sides. MySQL doesn't support this directly, but I have used "Method 2" in this helpful artice - http://www.xaprb.com/blog/2006/05/26/how-to-write-full-outer-join-in-mysql/ to create a Full Join SQL statement that seems to work for my data.
It requires a "UNION ALL" of "LEFT OUTER JOIN" and "RIGHT OUTER JOIN", using an excusion of duplicate data in the 2nd result set.
I'm matching up backup-tape barcodes in the two tables.
SQL
SELECT * FROM mediarecall AS mr
LEFT OUTER JOIN media AS m ON mr.alternateCode = m.tapeLabel
UNION ALL
SELECT * FROM mediarecall AS mr
RIGHT OUTER JOIN media AS m ON mr.alternateCode = m.tapeLabel
WHERE mr.alternateCode IS NULL
However, when I come to bring this into my python script using PeeWee, I discovered that there doesn't seem to be a JOIN.RIGHT_OUTER to allow me to re-create this SQL. I have used plently of JOIN.LEFT_OUTER in the past, but this is the first time I have needed a Full Join.
I can make PeeWee work with a RawQuery(), of course, but I'd love to keep my code looking more elegant if I can.
Has anyone managed to re-create a Full Join with MySQL and PeeWee without resorting to RawQuery?
I had envisaged something like the following (which I know is invalid):-
left_media = (MediaRecall
.select()
.join(Media,JOIN.LEFT_OUTER,
on=(MediaRecall.alternateCode == Media.tapeLabel)
)
)
right_media = (MediaRecall
.select()
.join(Media,JOIN.RIGHT_OUTER,
on=(MediaRecall.alternateCode == Media.tapeLabel)
)
)
.where(MediaRecall.alternateCode >> None) # Exclude duplicates
all_media = ( left_media | right_media) # UNION of the 2 results, which I
# can then use .where(), etc on
You can add support for right outer:
from peewee import JOIN
JOIN['RIGHT_OUTER'] = 'RIGHT OUTER'
Then you can use JOIN.RIGHT_OUTER.

How to select specific columns of multi-column join in sqlalchemy?

We are testing the possibility to implement SQLAlchemy to handle our database work. In some instances I need to join a database to a clone of itself (with potentially different data, of course).
An example of the SQL I need to replicate is as follows:
SELECT lt.name, lt.date, lt.type
FROM dbA.dbo.TableName as lt
LEFT JOIN dbB.dbo.TableName as rt
ON lt.name = rt.name
AND lt.date = rt.date
WHERE rt.type is NULL
So far I have tried using the join object but I can't get it to not spit the entire join out. I have also tried various .join() methods based on the tutorial here: http://docs.sqlalchemy.org/en/rel_1_0/orm/tutorial.html and I keep getting an AttributeError: "mapper" or not what I'm looking for.
The issues I'm running into is that I need to not only join on multiple fields, but I can't have any foreign key relationships built into the objects or tables.
Thanks to Kay's like I think I figured out the solution.
It looks like it can be solved by:
session.query(dbA_TableName).outerjoin(
dbB_TableName,
and_(dbA_TableName.name == dbB_TableName.name",
dbA_TableName.date == dbB_TableName.date")
).filter("dbB_TableName.type is NULL")`

Convert SQL query with JOIN ON to SQLAlchemy

My query looks like so (the '3' and '4' of course will be different in real usage):
SELECT op_entries.*, op_entries_status.*
FROM op_entries
LEFT OUTER JOIN op_entries_status ON op_entries.id = op_entries_status.op_id AND op_entries_status.order_id = 3
WHERE op_entries.op_artikel_id = 4 AND op_entries.active
ORDER BY op_entries.id
This is to get all stages (operations) in the production of an article/order-combination as well as the current status (progress) for each stage, if a status entry exists. If not the stage must still be returned, but the status rows be null.
I'm having immerse problems getting this to play in SQLAlchemy. This would have been a 2 part question, but I found the way to do this in plain SQL here already. Now in the ORM, that's a different story, I can't even figure out how to make JOIN ON conditions with the documentation!
Edit (new users are not allowed to answer their own question):
Believe I solved it, I guess writing it down as a question helped! Maybe this will help some other newbie out there.
query = db.session.query(OpEntries, OpEntriesStatus).\
outerjoin(OpEntriesStatus, db.and_(
OpEntries.id == OpEntriesStatus.op_id,
OpEntriesStatus.order_id == arg_order_id)).\
filter(db.and_(
OpEntries.op_artikel_id == artidQuery,
OpEntries.active)).\
order_by(OpEntries.id).\
all()
I'm keeping this open in case someone got a better solution or any insights.
Assuming some naming convention, the below should do it:
qry = (session.query(OpEntry, OpEntryStatus)
.join(OpEntryStatus, and_(OpEntry.id == OpEntryStatus.op_id, OpEntryStatus.order_id == 3))
.filter(OpEntry.op_artikel_id == 4)
.filter(OpEntry.active == 1)
.order_by(OpEntry.id)
)
Read join, outerjoin for more information on joins, where second parameter is an onclause. If you need more than 1, just use and_ or or_ to create any expression you need.

Django query based on FK — get all, not any

I need to find an order with all order items with status = completed. It looks like this:
FINISHED_STATUSES = [17,18,19]
if active_tab == 'outstanding':
orders = orders.exclude(items__status__in=FINISHED_STATUSES)
However, this query only gives me orders with any order item with a completed status. How would I do the query such that I retrieve only those orders with ALL order items with a completed status?
I think that you need to do raw query here:
Set you orders and items model as Orders and Items:
# raw query
sql = """\
select `orders`.* from `%{orders_table}s` as `orders`
join `%{items_table}s` as `items`
on `items`.`%{item_order_fk}s` = `orders`.`%{order_pk}s`
where `items`.`%{status_field}s` in (%{status_list}s)
group by `orders`.`%{orders_pk}s`
having count(*) = %{status_count)s;
""" % {
"orders_table": Orders._meta.db_table,
"items_table": Items._meta.db_table,
"order_pk": Orders._meta.pk.colum,
"item_order_fk":Items._meta.get_field("order").colum,
"status_field": Items._meta.get_field("status").colum,
"status_list": str(FINISHED_STATUSES)[1:-1],
"status_count": len(FINISHED_STATUSES),
}
orders = Orders.objects.raw(sql)
I was able to get this done by a sort of hackish way. First, I added an additional Boolean column, is_finished. Then, to find an order with at least one non-finished item:
orders = orders.filter(items__status__is_finished=False)
This gives me all un-finished orders.
Doing the opposite of that gets the finished orders:
orders = orders.exclude(items__status__is_finished=False)
Adding the boolean field is a good idea. That way you have your business rules clearly defined in the model.
Now, let's say that you still wanted to do it without resorting to adding fields. This may very well be a requirement given a different set of circumstances. Unfortunately, you can't really use subqueries or arbitrary joins in the Django ORM. You could, however, build up Q objects and make an implicit join in the having clause using filter() and annotate().
from django.db.models.aggregates import Count
from django.db.models import Q
from functools import reduce
from operator import or_
total_items_by_orders = Orders.objects.annotate(
item_count=Count('items'))
finished_items_by_orders = Orders.objects.filter(
items__status__in=FINISHED_STATUSES).annotate(
item_count=Count('items'))
orders = total_items_by_orders.exclude(
reduce(or_, (Q(id=o.id, item_count=o.item_count)
for o in finished_items_by_orders)))
Note that using raw SQL, while less elegant, would usually be more efficient.

Filtering with joined tables

I'm trying to get some query performance improved, but the generated query does not look the way I expect it to.
The results are retrieved using:
query = session.query(SomeModel).
options(joinedload_all('foo.bar')).
options(joinedload_all('foo.baz')).
options(joinedload('quux.other'))
What I want to do is filter on the table joined via 'first', but this way doesn't work:
query = query.filter(FooModel.address == '1.2.3.4')
It results in a clause like this attached to the query:
WHERE foos.address = '1.2.3.4'
Which doesn't do the filtering in a proper way, since the generated joins attach tables foos_1 and foos_2. If I try that query manually but change the filtering clause to:
WHERE foos_1.address = '1.2.3.4' AND foos_2.address = '1.2.3.4'
It works fine. The question is of course - how can I achieve this with sqlalchemy itself?
If you want to filter on joins, you use join():
session.query(SomeModel).join(SomeModel.foos).filter(Foo.something=='bar')
joinedload() and joinedload_all() are used only as a means to load related collections in one pass, not used for filtering/ordering!. Please read:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#joined-load - the note on "joinedload() is not a replacement for join()", as well as :
http://docs.sqlalchemy.org/en/latest/orm/loading.html#the-zen-of-eager-loading

Categories

Resources