I am using PeeWee with MySQL. I have two tables that need a full join to keep records from both left and right sides. MySQL doesn't support this directly, but I have used "Method 2" in this helpful artice - http://www.xaprb.com/blog/2006/05/26/how-to-write-full-outer-join-in-mysql/ to create a Full Join SQL statement that seems to work for my data.
It requires a "UNION ALL" of "LEFT OUTER JOIN" and "RIGHT OUTER JOIN", using an excusion of duplicate data in the 2nd result set.
I'm matching up backup-tape barcodes in the two tables.
SQL
SELECT * FROM mediarecall AS mr
LEFT OUTER JOIN media AS m ON mr.alternateCode = m.tapeLabel
UNION ALL
SELECT * FROM mediarecall AS mr
RIGHT OUTER JOIN media AS m ON mr.alternateCode = m.tapeLabel
WHERE mr.alternateCode IS NULL
However, when I come to bring this into my python script using PeeWee, I discovered that there doesn't seem to be a JOIN.RIGHT_OUTER to allow me to re-create this SQL. I have used plently of JOIN.LEFT_OUTER in the past, but this is the first time I have needed a Full Join.
I can make PeeWee work with a RawQuery(), of course, but I'd love to keep my code looking more elegant if I can.
Has anyone managed to re-create a Full Join with MySQL and PeeWee without resorting to RawQuery?
I had envisaged something like the following (which I know is invalid):-
left_media = (MediaRecall
.select()
.join(Media,JOIN.LEFT_OUTER,
on=(MediaRecall.alternateCode == Media.tapeLabel)
)
)
right_media = (MediaRecall
.select()
.join(Media,JOIN.RIGHT_OUTER,
on=(MediaRecall.alternateCode == Media.tapeLabel)
)
)
.where(MediaRecall.alternateCode >> None) # Exclude duplicates
all_media = ( left_media | right_media) # UNION of the 2 results, which I
# can then use .where(), etc on
You can add support for right outer:
from peewee import JOIN
JOIN['RIGHT_OUTER'] = 'RIGHT OUTER'
Then you can use JOIN.RIGHT_OUTER.
Related
I am querying a variety of different tables in a mysql database with sqlalchemy, and SQL query code.
My issue right now is renaming some of the columns being joined. The queries are all coming into one dataframe.
SELECT *
FROM original
LEFT JOIN table1
on original.id = table1.t1key
LEFT JOIN table2
on original.id = table2.t2key
LEFT JOIN table3
on original.id = table3.t3key;
All I actually want to get from those tables is a single column added to my query. Each table has a column with the same name. My approach to using an alias has been as below,
table1.columnchange AS 'table1columnchange'
table2.columnchange AS 'table2columnchange'
table3.columnchange AS 'table3columnchange'
But the variety of ways I've tried to implement this ends up with annoying errors.
I am querying around 20 different tables as well, so using SELECT * at the beginning while inefficient is ideal for the sake of ease.
The output I'm looking for is a dataframe that has each of the columns I need in it (which I am trying to then filter and build modeling with python for). I am fine with managing the query through sqlalchemy into pandas, the alias is what is giving me grief right now.
Thanks in advance
You can use nested queries:
SELECT
original.column1 as somename,
table1.column1 as somename1,
table2.column1 as somename2
FROM
(SELECT
column1
FROM
original
) original
LEFT JOIN (
SELECT
column1
FROM
table1
) table1 ON original.id = table1.t1key
LEFT JOIN (
SELECT
column1
FROM
table2
) table2 ON original.id = table2.t2key
I try to join a second table (PageLikes) on a first Table (PageVisits) after selecting only distinct values on one column of the first table with the python ORM peewee.
In pure SQL I can do this:
SELECT DISTINCT(pagevisits.visitor_id), pagelikes.liked_item FROM pagevisits
INNER JOIN pagelikes on pagevisits.visitor_id = pagelikes.user_id
In peewee with python I have tried:
query = (Page.select(
fn.Distinct(Pagevisits.visitor_id),
PageLikes.liked_item)
.join(PageLIkes)
This gives me an error:
distinct() takes 1 positional argument but 2 were given
The only way I can and have used distinct with peewee is like this:
query = (Page.select(
Pagevisits.visitor_id,
PageLikes.liked_item)
.distinct()
which does not seem to work for my scenario.
So how can I select only distinct values in one table based on one column before I join another table with peewee?
I don't believe you should be encountering an error using fn.DISTINCT() in that way. I'm curious to see the full traceback. In my testing locally, I have no problems running something like:
query = (PageVisits
.select(fn.DISTINCT(PageVisits.visitor_id), PageLikes.liked_item)
.join(PageLikes))
Which produces SQL equivalent to what you're after. I'm using the latest peewee code btw.
As Papooch suggested, calling distinct on the Model seems to work:
distinct_visitors = (Pagevisits
.select(
Pagevisits.visitor_id.distinct().alias("visitor")
)
.where(Pagevisits.page_id == "Some specifc page")
.alias('distinct_visitors')
)
query = (Pagelikes
.select(fn.Count(Pagelikes.liked_item),
)
.join(distinct_visitors, on=(distinct_visitors.c.visitor = Pagelikes.user_id))
.group_by(Pagelikes.liked_item)
)
We are testing the possibility to implement SQLAlchemy to handle our database work. In some instances I need to join a database to a clone of itself (with potentially different data, of course).
An example of the SQL I need to replicate is as follows:
SELECT lt.name, lt.date, lt.type
FROM dbA.dbo.TableName as lt
LEFT JOIN dbB.dbo.TableName as rt
ON lt.name = rt.name
AND lt.date = rt.date
WHERE rt.type is NULL
So far I have tried using the join object but I can't get it to not spit the entire join out. I have also tried various .join() methods based on the tutorial here: http://docs.sqlalchemy.org/en/rel_1_0/orm/tutorial.html and I keep getting an AttributeError: "mapper" or not what I'm looking for.
The issues I'm running into is that I need to not only join on multiple fields, but I can't have any foreign key relationships built into the objects or tables.
Thanks to Kay's like I think I figured out the solution.
It looks like it can be solved by:
session.query(dbA_TableName).outerjoin(
dbB_TableName,
and_(dbA_TableName.name == dbB_TableName.name",
dbA_TableName.date == dbB_TableName.date")
).filter("dbB_TableName.type is NULL")`
In my DB, I've basically 3 tables:
usergroup(id, name, deleted)
usergroup_presentation(id, groupid, presentationid)
presentation(id, name)
I'm trying to run this DAL query:
left_join = db.usergroup_presentation.on((db.usergroup_presentation.group_id==db.usergroup.id)
&(db.usergroup_presentation.presentation_id==db.presentation.id))
result = db(db.usergroup.deleted==False).select(
db.usergroup.id,
db.usergroup.name,
db.usergroup_presentation.id,
left=left_join,
orderby=db.usergroup.name)
And SQL returns this errors: Unknown column 'presentation.id' in 'on clause'
The generated SQL looks something like that:
SELECT usergroup.id, usergroup.name, usergroup_presentation.id
FROM presentation, usergroup
LEFT JOIN usergroup_presentation ON ((usergroup_presentation.group_id = usergroup.id) AND (usergroup_presentation.presentation_id = presentation.id))
WHERE (usergroup.deleted = 'F')
ORDER BY usergroup.name;
I did some researches on Google and I got this:
http://mysqljoin.com/joins/joins-in-mysql-5-1054-unknown-column-in-on-clause/
Then I tried to run this query directly in my DB:
SELECT usergroup.id, usergroup.name, usergroup_presentation.id
FROM (presentation, usergroup)
LEFT JOIN usergroup_presentation ON ((usergroup_presentation.group_id = usergroup.id) AND (usergroup_presentation.presentation_id = presentation.id))
WHERE (usergroup.deleted = 'F')
ORDER BY usergroup.name;
And indeed it works when adding the brackets around the FROM tables.
My question is how can I generate a SQL query like this (with brackets) with DAL without executing a basic executesql ?
Even better, I would like to get a cleaner SQL query using INNER JOIN and LEFT JOIN. I don't know if it's possible with my query though.
I believe this has now been fixed in trunk. Please help us check it. P.S. next time open a ticket (https://code.google.com/p/web2py/issues/list) and it will be fixed sooner.
How to implement FULL OUTER JOIN in sqlalchemy on orm level.
Here my code:
q1 = (db.session.query(
tb1.user_id.label('u_id'),
func.count(tb1.id).label('tb1_c')
)
.group_by(tb1.user_id)
)
q2 = (db.session.query(
tb2.user_id.label('u_id'),
func.count(tb2.id).label('tb2_c')
)
.group_by(tb2.user_id)
)
above two queries and I want to apply FULL OUTER JOIN on them.
Since 1.1. sqlalchemy now fully supports FULL OUTER JOINS. See here: https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.join.params.full
So for your code you would want to do:
q1 = (db.session.query(
tb1.user_id.label('u_id'),
func.count(tb1.id).label('tb1_c')
)
.group_by(tb1.user_id)
).cte('q1')
q2 = (db.session.query(
tb2.user_id.label('u_id'),
func.count(tb2.id).label('tb2_c')
)
.group_by(tb2.user_id)
).cte('q2')
result = db.session.query(
func.coalesce(q1.u_id, q2.u_id).label('u_id'),
q1.tb1_c,
q2.tb2_c
).join(
q2,
q1.u_id == q2.u_id,
full=True
)
Note that as with any FULL OUTER JOIN, tb1_c and tb2_c may be null so you might want to apply a coalesce on them.
First of all, sqlalchemy does not support FULL JOIN out of the box, and for some good reasons. So any solution proposed will consist of two parts:
a work-around for missing functionality
sqlalchemy syntax to build a query for that work-around
Now, for the reasons to avoid the FULL JOIN, please read some old blog Better Alternatives to a FULL OUTER JOIN.
From this very blog I will take the idea of how to avoid FULL JOIN by adding 0 values to the missing columns and aggregating (SUM) on UNION ALL intead. SA code might look something like below:
q1 = (session.query(
tb1.user_id.label('u_id'),
func.count(tb1.id).label('tb1_c'),
literal(0).label('tb2_c'), # #NOTE: added 0
).group_by(tb1.user_id))
q2 = (session.query(
tb2.user_id.label('u_id'),
literal(0).label('tb1_c'), # #NOTE: added 0
func.count(tb2.id).label('tb2_c')
).group_by(tb2.user_id))
qt = union_all(q1, q2).alias("united")
qr = select([qt.c.u_id, func.sum(qt.c.tb1_c), func.sum(qt.c.tb2_c)]).group_by(qt.c.u_id)
Having composed the query above, I actually might consider other options:
simply execute those two queries separately and aggregate the results already in Python itself (for not so large results sets)
given that it looks like some kind of reporting functionality rather than business model workflow, create a SQL query and execute it directly via engine. (only if it really is much better performing though)