Concatenate multiple join sqlalchemy python with different foreign keys - python

I have the following issue, i need to convert the following query into python's sqlalchemy orm:
SELECT parts.model_num,
ptemp_objects.ptemp_id, ptemp_objects.type, ptemp_objects.area, ptemp_objects.text, ptemp_objects.x, ptemp_objects.y, ptemp_objects.width, ptemp_objects.height, ptemp_objects.font, ptemp_objects.font_size, ptemp_objects.alignment, ptemp_objects.bold, ptemp_objects.italic, ptemp_objects.display_order,
ptype_areas.x, ptype_areas.y, ptype_areas.name, ptype_areas.width, ptype_areas.height,
paper_types.name, paper_types.width, paper_types.height, paper_types.left_margin, paper_types.right_margin, paper_types.top_margin, paper_types.bottom_margin,
print_images.path
FROM parts
JOIN prints
ON prints.part_id = parts.id
JOIN ptemp_objects
ON prints.ptemp_id = ptemp_objects.ptemp_id
JOIN ptype_areas
ON ptemp_objects.area = ptype_areas.id
JOIN paper_types
ON ptype_areas.ptype_id = paper_types.id
LEFT JOIN print_images
ON ptemp_objects.type = print_images.id
where prints.part_id = 879 and parts.model_num="BD854-20-YN-125-BN";
I have been trying with this:
session.query(Table1, Table2, Table3).select_from(Table1).join(Table2).join(Table3).all()
but i dont know how to build this in python's sqlalchemy nor how to declare it with so many foreign keys.
I am a beginner using this orm, i've been reading sqlalchemy's documentation but i have not been able to understand it well nor i have not found any solution to build this query. It would be great if you could help me to build this and a bit of explain also would be good.
Thanks!
I am using:
Windows 10 Professional.
Python 3.8.8.
Visual Studio Code.
SQLAlchemy 1.4.22

I could figure out the way to perform many joins into python's sqlalchemy,
basically i performed this code in python:
query = session.query(Parts.model_num, Parts.description, PtempObjects.text,PtypeAreas, PrintImages).select_from(Parts)\
.join(Prints,Prints.part_id==Parts.id)\
.join(PtempObjects,Prints.ptemp_id==PtempObjects.ptemp_id)\
.join(PtypeAreas,PtypeAreas.id==PtempObjects.area)\
.join(PrintImages,PrintImages.id==PtempObjects.type, isouter=True)\
.filter(Prints.part_id==879,Parts.model_num=="BD854-20-YN-125-BN")

Related

How to Join two tables in Django

select *
from article_article
left join article_upvote ON article_article.id=article_upvote.article_id
I want to write this query with Django ORM. How can I write this? Please help
You can use prefetch related:
articles = Article.objects.all().prefetch_related('upvote')
If you want to see the query executed do:
print(articles.query)

MYSQL PeeWee Full Join without RawQuery

I am using PeeWee with MySQL. I have two tables that need a full join to keep records from both left and right sides. MySQL doesn't support this directly, but I have used "Method 2" in this helpful artice - http://www.xaprb.com/blog/2006/05/26/how-to-write-full-outer-join-in-mysql/ to create a Full Join SQL statement that seems to work for my data.
It requires a "UNION ALL" of "LEFT OUTER JOIN" and "RIGHT OUTER JOIN", using an excusion of duplicate data in the 2nd result set.
I'm matching up backup-tape barcodes in the two tables.
SQL
SELECT * FROM mediarecall AS mr
LEFT OUTER JOIN media AS m ON mr.alternateCode = m.tapeLabel
UNION ALL
SELECT * FROM mediarecall AS mr
RIGHT OUTER JOIN media AS m ON mr.alternateCode = m.tapeLabel
WHERE mr.alternateCode IS NULL
However, when I come to bring this into my python script using PeeWee, I discovered that there doesn't seem to be a JOIN.RIGHT_OUTER to allow me to re-create this SQL. I have used plently of JOIN.LEFT_OUTER in the past, but this is the first time I have needed a Full Join.
I can make PeeWee work with a RawQuery(), of course, but I'd love to keep my code looking more elegant if I can.
Has anyone managed to re-create a Full Join with MySQL and PeeWee without resorting to RawQuery?
I had envisaged something like the following (which I know is invalid):-
left_media = (MediaRecall
.select()
.join(Media,JOIN.LEFT_OUTER,
on=(MediaRecall.alternateCode == Media.tapeLabel)
)
)
right_media = (MediaRecall
.select()
.join(Media,JOIN.RIGHT_OUTER,
on=(MediaRecall.alternateCode == Media.tapeLabel)
)
)
.where(MediaRecall.alternateCode >> None) # Exclude duplicates
all_media = ( left_media | right_media) # UNION of the 2 results, which I
# can then use .where(), etc on
You can add support for right outer:
from peewee import JOIN
JOIN['RIGHT_OUTER'] = 'RIGHT OUTER'
Then you can use JOIN.RIGHT_OUTER.

How to select specific columns of multi-column join in sqlalchemy?

We are testing the possibility to implement SQLAlchemy to handle our database work. In some instances I need to join a database to a clone of itself (with potentially different data, of course).
An example of the SQL I need to replicate is as follows:
SELECT lt.name, lt.date, lt.type
FROM dbA.dbo.TableName as lt
LEFT JOIN dbB.dbo.TableName as rt
ON lt.name = rt.name
AND lt.date = rt.date
WHERE rt.type is NULL
So far I have tried using the join object but I can't get it to not spit the entire join out. I have also tried various .join() methods based on the tutorial here: http://docs.sqlalchemy.org/en/rel_1_0/orm/tutorial.html and I keep getting an AttributeError: "mapper" or not what I'm looking for.
The issues I'm running into is that I need to not only join on multiple fields, but I can't have any foreign key relationships built into the objects or tables.
Thanks to Kay's like I think I figured out the solution.
It looks like it can be solved by:
session.query(dbA_TableName).outerjoin(
dbB_TableName,
and_(dbA_TableName.name == dbB_TableName.name",
dbA_TableName.date == dbB_TableName.date")
).filter("dbB_TableName.type is NULL")`

How to do general maths in sql query in django?

The following query I'd love to do in django, ideally without using iteration. I just want the database call to return the result denoted by the query below. Unfortunately according to the docs this doesn't seem to be possible; only the general functions like Avg, Max and Min etc are available. Currently I'm using django 1.4 but I'm happy to rewrite stuff from django 1.8 (hence the docs page; I've heard that 1.8 does a lot of these things much better than 1.4)
select sum(c.attr1 * fs.attr2)/ sum(c.attr1) from fancyStatistics as fs
left join superData s on fs.super_id=s.id
left join crazyData c on s.crazy_id=c.id;
Note:
The main reason for doing this in django directly is that if we ever want to change our database from MySQL to something more appropriate for django, it would be good not to have to rewrite all the queries.
You should be able to get aggregates with F expressions to do most of what you want without dropping into SQL.
https://docs.djangoproject.com/en/1.8/topics/db/aggregation/#joins-and-aggregates
aggregate_dict = FancyStatistics.objects.all()\
.aggregate(
sum1=Sum(
F('superdata__crazydata__attr1') * F('attr2'), output_field=FloatField()
) ,
sum2=Sum('superdata__crazydata__attr1')
)
)
result = aggregate_dict['sum1'] / aggregate_dict['sum2']
You need to specify the output fields if the data types used are different.
You can do that query in Django directly using your SQL expression. Check the docs concerning performing raw SQL queries.

Join with Pythons SQLite module is slower than doing it manually

I am using pythons built-in sqlite3 module to access a database. My query executes a join between a table of 150000 entries and a table of 40000 entries, the result contains about 150000 entries again. If I execute the query in the SQLite Manager it takes a few seconds, but if I execute the same query from Python, it has not finished after a minute. Here is the code I use:
cursor = self._connection.cursor()
annotationList = cursor.execute("SELECT PrimaryId, GOId " +
"FROM Proteins, Annotations " +
"WHERE Proteins.Id = Annotations.ProteinId")
annotations = defaultdict(list)
for protein, goterm in annotationList:
annotations[protein].append(goterm)
I did the fetchall just to measure the execution time. Does anyone have an explanation for the huge difference in performance? I am using Python 2.6.1 on Mac OS X 10.6.4.
I implemented the join manually, and this works much faster. The code looks like this:
cursor = self._connection.cursor()
proteinList = cursor.execute("SELECT Id, PrimaryId FROM Proteins ").fetchall()
annotationList = cursor.execute("SELECT ProteinId, GOId FROM Annotations").fetchall()
proteins = dict(proteinList)
annotations = defaultdict(list)
for protein, goterm in annotationList:
annotations[proteins[protein]].append(goterm)
So when I fetch the tables myself and then do the join in Python, it takes about 2 seconds. The code above takes forever. Am I missing something here?
I tried the same with apsw, and it works just fine (the code does not need to be changed at all), the performance it great. I'm still wondering why this is so slow with the sqlite3-module.
There is a discussion about it here: http://www.mail-archive.com/python-list#python.org/msg253067.html
It seems that there is a performance bottleneck in the sqlite3 module. There is an advice how to make your queries faster:
make sure that you do have indices on the join columns
use pysqlite
You haven't posted the schema of the tables in question, but I think there might be a problem with indexes, specifically not having an index on Proteins.Id or Annotations.ProteinId (or both).
Create the SQLite indexes like this
CREATE INDEX IF NOT EXISTS index_Proteins_Id ON Proteins (Id)
CREATE INDEX IF NOT EXISTS index_Annotations_ProteinId ON Annotations (ProteinId)
I wanted to update this because I am noticing the same issue and we are now 2022...
In my own application I am using python3 and sqlite3 to do some data wrangling on large databases (>100000 rows * >200 columns). In particular, I have noticed that my 3 table inner join clocks in around ~12 minutes of run time in python, whereas running the same join query in sqlite3 from the CLI runs in ~100 seconds. All the join predicates are properly indexed and the EXPLAIN QUERY PLAN indicates that the added time is most likely because I am using SELECT *, which is a necessary evil in my particular context.
The performance discrepancy caused me to pull my hair out all night until I realized there is a quick fix from here: Running a Sqlite3 Script from Command Line. This is definitely a workaround at best, but I have research due so this is my fix.
Write out the query to an .sql file (I am using f-strings to pass variables in so I used an example with {foo} here)
fi = open("filename.sql", "w")
fi.write(f"CREATE TABLE {Foo} AS SELECT * FROM Table1 INNER JOIN Table2 ON Table2.KeyColumn = Table1.KeyColumn INNER JOIN Table3 ON Table3.KeyColumn = Table1.KeyColumn;")
fi.close()
Run os.system from inside python and send the .sql file to sqlite3
os.system(f"sqlite3 {database} < filename.sql")
Make sure you close any open connection before running this so you don't end up locked out and you'll have to re-instantiate any connection objects afterward if you're going back to working in sqlite within python.
Hope this helps and if anyone has figured the source of this out, please link to it!

Categories

Resources