Automagically including a join when needed in SQLAlchemy - python

I'm building a class about a set of tables. One of those dables, TableDates, includes a field TableDates.date which I use to filter for periods. That's easy. Assuming query already have the needed select construct:
query = query.filter(TableDates.date < date)
And it works when query already included a proper join with TableDates.
But in some cases, y have a query that does not include a proper join with it. In those cases, I should use:
query = query.join(TableDates).filter(TableDates.date < date)
And here is the problem. I would like to include this filter code in an object method that would work both with queries that already joined TableDates, or not. Of course, I can track in my own methods when I join with TableDates, to decide if I have to include join(TableDates) in the query or not. But my question is: Is there a way of letting SQLAlchemy do the job? I've browsing the documentation, and I didn't find a clue, but maybe I missed it.
More in particular, I was thinking of either:
Having a way of checking a query to know if TableDates is already joined in it.
Having a way of writting the join(TableDates) in a way that if it is already joined in query, it does nothing (of course, if I just try to add that join when TableDates is already joined, I get an exception stating that I cannot joint it with itself).

Related

Returning null where values don't exist in SQLAlchemy (Python)

I've got 3 tables
tblOffers (tsin, offerId)
tblProducts (tsin)
tblThresholds (offerId)
I'm trying to do a select on columns from all 3 tables.
The thing is, there might not be a record in tblThresholds which matches an offerId. In that instance, I still need the information from the other two tables to return... I don't mind if those columns or fields that are missing are null or whatever in the response.
Currently, I'm not getting anything back at all unless there is information in tblThresholds which correctly matches the offerId.
I suspect the issue lies with the way I'm doing the joining but I'm not very experienced with SQL and brand new to SQLAlchemy.
(Using MySQL by the way)
query = db.select([
tblOffers.c.title,
tblOffers.c.currentPrice,
tblOffers.c.rrp,
tblOffers.c.offerId,
tblOffers.c.gtin,
tblOffers.c.status,
tblOffers.c.mpBarcode,
tblThresholds.c.minPrice,
tblThresholds.c.maxPrice,
tblThresholds.c.increment,
tblProducts.c.currentSellerId,
tblProducts.c.brand,
tblOffers.c.productTakealotURL,
tblOffers.c.productLineId
]).select_from(
tblOffers.
join(tblProducts, tblProducts.c.tsin == tblOffers.c.tsinId).
join(tblThresholds, tblThresholds.c.offerId == tblOffers.c.offerId)
)
I'm happy to add to this question or provide more information but since I'm pretty new to this, I don't entirely know what other information might be needed.
Thanks
Try for hours -> ask here -> find the answer minutes later on your own 🤦‍♂️
So for those who might end up here for the same reason I did, here you go.
Turns out SQLAlchemy does a right join by default (from what I can tell - please correct me if I'm wrong). I added a isouter=True to my join on tblThresholds and it worked!
Link to the info in the docs: https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=join#sqlalchemy.orm.query.Query.join.params.isouter
Final code:
query = db.select([
tblOffers.c.title,
tblOffers.c.currentPrice,
tblOffers.c.rrp,
tblOffers.c.offerId,
tblOffers.c.gtin,
tblOffers.c.status,
tblOffers.c.mpBarcode,
tblThresholds.c.minPrice,
tblThresholds.c.maxPrice,
tblThresholds.c.increment,
tblProducts.c.brand,
tblOffers.c.productTakealotURL,
tblOffers.c.productLineId
]).select_from(
tblOffers.
join(tblProducts, tblProducts.c.tsin == tblOffers.c.tsinId).
join(tblThresholds, tblThresholds.c.offerId == tblOffers.c.offerId, isouter=True)
)

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?
You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.
Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(
We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()
This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)
We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

How to select specific columns of multi-column join in sqlalchemy?

We are testing the possibility to implement SQLAlchemy to handle our database work. In some instances I need to join a database to a clone of itself (with potentially different data, of course).
An example of the SQL I need to replicate is as follows:
SELECT lt.name, lt.date, lt.type
FROM dbA.dbo.TableName as lt
LEFT JOIN dbB.dbo.TableName as rt
ON lt.name = rt.name
AND lt.date = rt.date
WHERE rt.type is NULL
So far I have tried using the join object but I can't get it to not spit the entire join out. I have also tried various .join() methods based on the tutorial here: http://docs.sqlalchemy.org/en/rel_1_0/orm/tutorial.html and I keep getting an AttributeError: "mapper" or not what I'm looking for.
The issues I'm running into is that I need to not only join on multiple fields, but I can't have any foreign key relationships built into the objects or tables.
Thanks to Kay's like I think I figured out the solution.
It looks like it can be solved by:
session.query(dbA_TableName).outerjoin(
dbB_TableName,
and_(dbA_TableName.name == dbB_TableName.name",
dbA_TableName.date == dbB_TableName.date")
).filter("dbB_TableName.type is NULL")`

How to generate a random Id?

I have a Postgres 9.3 table that has a column called id as PKEY, id is char(9), and only allow lowercase a-z0-9, I use Python with psycopg to insert to this table.
When I need to insert into this table, I call a Python function get_new_id(), my question is, how to make get_new_id() efficient?
I have the following solutions, none of them satisfy me.
a) Pre-generate a lot of ids, store them in some table, when I need a new id, I SELECT one from this table, then delete it from this table, then return this selected id. Down side of this solution is that it need to maintain this table, in each get_new_id() call, there will also have a SELECT COUNT in order to find out if I need to generate more ids to put into this table.
b) When get_new_id() gets called, it generate a random id, then pass this id to a stored procedure to check if this id is already in use, if no, we are good, if yes, do b) again. Down side of this solution is, when the table gets bigger, the failure rate may be high, and there is a chance that, two get_new_id() calls in two processes will generate the same id, say, 1234567, and 1234567 is not used a PKEY yet, so, when insert, one process will fail.
I think this is a pretty old problem, what's the perfect solution?
Edit
I think this has been answered, see Jon Clements' comment.
Offtopic because you already have a char(9) datatype:
I would use an UUID when a random string is needed, it's a standard and almost any programming language (including Python) can generate UUIDs for you.
PostgreSQL can also do it for you, using the uuid-ossp extension.
select left(md5(random()::text || now()), 9);
left
-----------
c4c384561
Make the id the primary key and try the insert. If an exception is thrown catch it and retry. Nothing fancy about it. why only 9 characters? Make it the full 32.
Check this answer for how to make it smaller: https://stackoverflow.com/a/15982876/131874

SQLAlchemy: Perform double filter and sum in the same query

I have a general ledger table in my DB with the columns: member_id, is_credit and amount. I want to get the current balance of the member.
Ideally that can be got by two queries where the first query has is_credit == True and the second query is_credit == False something close to:
credit_amount = session.query(func.sum(Funds.amount).label('Debit_Amount')).filter(Funds.member_id==member_id, Funds.is_credit==True)
debit_amount = session.query(func.sum(Funds.amount).label('Debit_Amount')).filter(Funds.member_id==member_id, Funds.is_credit==False)
balance = credit_amount - debit_amount
and then subtract the result. Is there a way to have the above run in one query to give the balance?
From the comments you state that hybrids are too advanced right now, so I will propose an easier but not as efficient solution (still its okay):
(session.query(Funds.is_credit, func.sum(Funds.amount).label('Debit_Amount')).
filter(Funds.member_d==member_id).group_by(Funds.is_credit))
What will this do? You will recieve a two-row result, one has the credit, the other the debit, depending on the is_credit property of the result. The second part (Debit_Amount) will be the value. You then evaluate them to get the result: Only one query that fetches both values.
If you are unsure what group_by does, I recommend you read up on SQL before doing it in SQLAlchemy. SQLAlchemy offers very easy usage of SQL but it requires that you understand SQL as well. Thus, I recommend: First build a query in SQL and see that it does what you want - then translate it to SQLAlchemy and see that it does the same. Otherwise SQLAlchemy will often generate highly inefficient queries, because you asked for the wrong thing.

Categories

Resources