Convert SQL query with JOIN ON to SQLAlchemy - python

My query looks like so (the '3' and '4' of course will be different in real usage):
SELECT op_entries.*, op_entries_status.*
FROM op_entries
LEFT OUTER JOIN op_entries_status ON op_entries.id = op_entries_status.op_id AND op_entries_status.order_id = 3
WHERE op_entries.op_artikel_id = 4 AND op_entries.active
ORDER BY op_entries.id
This is to get all stages (operations) in the production of an article/order-combination as well as the current status (progress) for each stage, if a status entry exists. If not the stage must still be returned, but the status rows be null.
I'm having immerse problems getting this to play in SQLAlchemy. This would have been a 2 part question, but I found the way to do this in plain SQL here already. Now in the ORM, that's a different story, I can't even figure out how to make JOIN ON conditions with the documentation!
Edit (new users are not allowed to answer their own question):
Believe I solved it, I guess writing it down as a question helped! Maybe this will help some other newbie out there.
query = db.session.query(OpEntries, OpEntriesStatus).\
outerjoin(OpEntriesStatus, db.and_(
OpEntries.id == OpEntriesStatus.op_id,
OpEntriesStatus.order_id == arg_order_id)).\
filter(db.and_(
OpEntries.op_artikel_id == artidQuery,
OpEntries.active)).\
order_by(OpEntries.id).\
all()
I'm keeping this open in case someone got a better solution or any insights.

Assuming some naming convention, the below should do it:
qry = (session.query(OpEntry, OpEntryStatus)
.join(OpEntryStatus, and_(OpEntry.id == OpEntryStatus.op_id, OpEntryStatus.order_id == 3))
.filter(OpEntry.op_artikel_id == 4)
.filter(OpEntry.active == 1)
.order_by(OpEntry.id)
)
Read join, outerjoin for more information on joins, where second parameter is an onclause. If you need more than 1, just use and_ or or_ to create any expression you need.

Related

Returning null where values don't exist in SQLAlchemy (Python)

I've got 3 tables
tblOffers (tsin, offerId)
tblProducts (tsin)
tblThresholds (offerId)
I'm trying to do a select on columns from all 3 tables.
The thing is, there might not be a record in tblThresholds which matches an offerId. In that instance, I still need the information from the other two tables to return... I don't mind if those columns or fields that are missing are null or whatever in the response.
Currently, I'm not getting anything back at all unless there is information in tblThresholds which correctly matches the offerId.
I suspect the issue lies with the way I'm doing the joining but I'm not very experienced with SQL and brand new to SQLAlchemy.
(Using MySQL by the way)
query = db.select([
tblOffers.c.title,
tblOffers.c.currentPrice,
tblOffers.c.rrp,
tblOffers.c.offerId,
tblOffers.c.gtin,
tblOffers.c.status,
tblOffers.c.mpBarcode,
tblThresholds.c.minPrice,
tblThresholds.c.maxPrice,
tblThresholds.c.increment,
tblProducts.c.currentSellerId,
tblProducts.c.brand,
tblOffers.c.productTakealotURL,
tblOffers.c.productLineId
]).select_from(
tblOffers.
join(tblProducts, tblProducts.c.tsin == tblOffers.c.tsinId).
join(tblThresholds, tblThresholds.c.offerId == tblOffers.c.offerId)
)
I'm happy to add to this question or provide more information but since I'm pretty new to this, I don't entirely know what other information might be needed.
Thanks
Try for hours -> ask here -> find the answer minutes later on your own 🤦‍♂️
So for those who might end up here for the same reason I did, here you go.
Turns out SQLAlchemy does a right join by default (from what I can tell - please correct me if I'm wrong). I added a isouter=True to my join on tblThresholds and it worked!
Link to the info in the docs: https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=join#sqlalchemy.orm.query.Query.join.params.isouter
Final code:
query = db.select([
tblOffers.c.title,
tblOffers.c.currentPrice,
tblOffers.c.rrp,
tblOffers.c.offerId,
tblOffers.c.gtin,
tblOffers.c.status,
tblOffers.c.mpBarcode,
tblThresholds.c.minPrice,
tblThresholds.c.maxPrice,
tblThresholds.c.increment,
tblProducts.c.brand,
tblOffers.c.productTakealotURL,
tblOffers.c.productLineId
]).select_from(
tblOffers.
join(tblProducts, tblProducts.c.tsin == tblOffers.c.tsinId).
join(tblThresholds, tblThresholds.c.offerId == tblOffers.c.offerId, isouter=True)
)

Unable to access aliased fields in SQLAlchemy query results?

Confused working with query object results. I am not using foreign keys in this example.
lookuplocation = aliased(ValuePair)
lookupoccupation = aliased(ValuePair)
persons = db.session.query(Person.lastname, lookuplocation.displaytext, lookupoccupation.displaytext).\
outerjoin(lookuplocation, Person.location == lookuplocation.valuepairid).\
outerjoin(lookupoccupation, Person.occupation1 == lookupoccupation.valuepairid).all()
Results are correct as far as data is concerned. However, when I try to access an individual row of data I have an issue:
persons[0].lastname works as I expected and returns data.
However, there is a person.displaytext in the result but since I aliased the displaytext entity, I get just one result. I understand why I get the result but I need to know what aliased field names I would use to get the two displaytext columns.
The actual SQL statement generated by the above join is as follows:
SELECT person.lastname AS person_lastname, valuepair_1.displaytext AS valuepair_1_displaytext, valuepair_2.displaytext AS valuepair_2_displaytext
FROM person LEFT OUTER JOIN valuepair AS valuepair_1 ON person.location = valuepair_1.valuepairid LEFT OUTER JOIN valuepair AS valuepair_2 ON person.occupation1 = valuepair_2.valuepairid
But none of these "as" field names are available to me in the results.
I'm new to SqlAlchemy so most likely this is a "newbie" issue.
Thanks.
Sorry - RTFM issue - should have been:
lookuplocation.displaytext.label("myfield1"),
lookupoccupation.displaytext.label("myfield2")
After results are returned reference field with person.myfield
Simple.

Web2Py DAL, left join and operator precedence

In my DB, I've basically 3 tables:
usergroup(id, name, deleted)
usergroup_presentation(id, groupid, presentationid)
presentation(id, name)
I'm trying to run this DAL query:
left_join = db.usergroup_presentation.on((db.usergroup_presentation.group_id==db.usergroup.id)
&(db.usergroup_presentation.presentation_id==db.presentation.id))
result = db(db.usergroup.deleted==False).select(
db.usergroup.id,
db.usergroup.name,
db.usergroup_presentation.id,
left=left_join,
orderby=db.usergroup.name)
And SQL returns this errors: Unknown column 'presentation.id' in 'on clause'
The generated SQL looks something like that:
SELECT usergroup.id, usergroup.name, usergroup_presentation.id
FROM presentation, usergroup
LEFT JOIN usergroup_presentation ON ((usergroup_presentation.group_id = usergroup.id) AND (usergroup_presentation.presentation_id = presentation.id))
WHERE (usergroup.deleted = 'F')
ORDER BY usergroup.name;
I did some researches on Google and I got this:
http://mysqljoin.com/joins/joins-in-mysql-5-1054-unknown-column-in-on-clause/
Then I tried to run this query directly in my DB:
SELECT usergroup.id, usergroup.name, usergroup_presentation.id
FROM (presentation, usergroup)
LEFT JOIN usergroup_presentation ON ((usergroup_presentation.group_id = usergroup.id) AND (usergroup_presentation.presentation_id = presentation.id))
WHERE (usergroup.deleted = 'F')
ORDER BY usergroup.name;
And indeed it works when adding the brackets around the FROM tables.
My question is how can I generate a SQL query like this (with brackets) with DAL without executing a basic executesql ?
Even better, I would like to get a cleaner SQL query using INNER JOIN and LEFT JOIN. I don't know if it's possible with my query though.
I believe this has now been fixed in trunk. Please help us check it. P.S. next time open a ticket (https://code.google.com/p/web2py/issues/list) and it will be fixed sooner.

SQLAlchemy: Perform double filter and sum in the same query

I have a general ledger table in my DB with the columns: member_id, is_credit and amount. I want to get the current balance of the member.
Ideally that can be got by two queries where the first query has is_credit == True and the second query is_credit == False something close to:
credit_amount = session.query(func.sum(Funds.amount).label('Debit_Amount')).filter(Funds.member_id==member_id, Funds.is_credit==True)
debit_amount = session.query(func.sum(Funds.amount).label('Debit_Amount')).filter(Funds.member_id==member_id, Funds.is_credit==False)
balance = credit_amount - debit_amount
and then subtract the result. Is there a way to have the above run in one query to give the balance?
From the comments you state that hybrids are too advanced right now, so I will propose an easier but not as efficient solution (still its okay):
(session.query(Funds.is_credit, func.sum(Funds.amount).label('Debit_Amount')).
filter(Funds.member_d==member_id).group_by(Funds.is_credit))
What will this do? You will recieve a two-row result, one has the credit, the other the debit, depending on the is_credit property of the result. The second part (Debit_Amount) will be the value. You then evaluate them to get the result: Only one query that fetches both values.
If you are unsure what group_by does, I recommend you read up on SQL before doing it in SQLAlchemy. SQLAlchemy offers very easy usage of SQL but it requires that you understand SQL as well. Thus, I recommend: First build a query in SQL and see that it does what you want - then translate it to SQLAlchemy and see that it does the same. Otherwise SQLAlchemy will often generate highly inefficient queries, because you asked for the wrong thing.

How can I query rows with unique values on a joined column?

I'm trying to have my popular_query subquery remove dupe Place.id, but it doesn't remove it. This is the code below. I tried using distinct but it does not respect the order_by rule.
SimilarPost = aliased(Post)
SimilarPostOption = aliased(PostOption)
popular_query = (db.session.query(Post, func.count(SimilarPost.id)).
join(Place, Place.id == Post.place_id).
join(PostOption, PostOption.post_id == Post.id).
outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val).
join(SimilarPost,SimilarPost.id == SimilarPostOption.post_id).
filter(Place.id == Post.place_id).
filter(self.radius_cond()).
group_by(Post.id).
group_by(Place.id).
order_by(desc(func.count(SimilarPost.id))).
order_by(desc(Post.timestamp))
).subquery().select()
all_posts = db.session.query(Post).select_from(filter.pick()).all()
I did a test printout with
print [x.place.name for x in all_posts]
[u'placeB', u'placeB', u'placeB', u'placeC', u'placeC', u'placeA']
How can I fix this?
Thanks!
This should get you what you want:
SimilarPost = aliased(Post)
SimilarPostOption = aliased(PostOption)
post_popularity = (db.session.query(func.count(SimilarPost.id))
.select_from(PostOption)
.filter(PostOption.post_id == Post.id)
.correlate(Post)
.outerjoin(SimilarPostOption, PostOption.val == SimilarPostOption.val)
.join(SimilarPost, sql.and_(
SimilarPost.id == SimilarPostOption.post_id,
SimilarPost.place_id == Post.place_id)
)
.as_scalar())
popular_post_id = (db.session.query(Post.id)
.filter(Post.place_id == Place.id)
.correlate(Place)
.order_by(post_popularity.desc())
.limit(1)
.as_scalar())
deduped_posts = (db.session.query(Post, post_popularity)
.join(Place)
.filter(Post.id == popular_post_id)
.order_by(post_popularity.desc(), Post.timestamp.desc())
.all())
I can't speak to the runtime performance with large data sets, and there may be a better solution, but that's what I managed to synthesize from quite a few sources (MySQL JOIN with LIMIT 1 on joined table, SQLAlchemy - subquery in a WHERE clause, SQLAlchemy Query documentation). The biggest complicating factor is that you apparently need to use as_scalar to nest the subqueries in the right places, and therefore can't return both the Post id and the count from the same subquery.
FWIW, this is kind of a behemoth and I concur with user1675804 that SQLAlchemy code this deep is hard to grok and not very maintainable. You should take a hard look at any more low-tech solutions available like adding columns to the db or doing more of the work in python code.
I don't want to sound like the bad guy here but... in my opinion your approach to the issue seems far less than optimal... if you're using postgresql you could simplify the whole thing using WITH ... but a better approach factoring in my assumption that these posts will be read much more often than updated would be to add some columns to your tables that are updated by triggers on insert/update to other tables, at least if performance is likely to ever become an issue this is the solution I'd go with
Not very familiar with sqlalchemy, so can't write it in clear code for you, but the only other solution I can come up with uses at least a subquery to select the things from order by for each of the columns in group by, and that will add significantly to your already slow query

Categories

Resources