How to join detail twice on a master in SQLAlchemy? - python

Situation: I have two tables, say 'master' and 'detail', where 'master' has two columns that refer to 'detail': 'foo_id', 'bar_id'. That is, I need to join detail twice with different names. I want to do:
SELECT master.id, foo.name, bar.name, other stuff ...
FROM master
JOIN detail AS foo ON foo.id = master.foo_id
JOIN detail AS bar ON bar.id = master.bar_id
how do I do that using SQLAlchemy?
Note that I am not using ORM. Also I am refering to database objects from metadata (strings), therefore I do: table.c["foo_id"] instead of table.c.foo_id (if this information is going to be relevant to the statement construction).

Fighting for quite a long time with the problem, i've solved it couple of minutes after posting my question here. The solution was to store and then reuse all aliased tables. Before I was using alias in join and then same alias for table to fetch column reference in SELECT, which resulted in redundant and broken 'FROM detail AS foo, detail AS bar, master JOIN detail ... JOIN detail ..."
Working solution:
create an empty dictionary for aliased tables: { "table name": Table object }
for each master-detail join register new table with an alias in the dictionary:
detail_table = Table(name,...).alias(alias) tables[alias] = detail_table
join next aliased table:
expression = join(expression, detail_table)
When collecting fields for SELECT, do not get another Table(name, ...) but get from your aliased table list:
column = tables[table_name].c[column_name]

Related

SQLAlchemy alias a joined table in a many-to-many relation

In a legacy database we have, there is a pretty special datastructure where we have two many-to-many relations joining the same two tables companies to paymentschedules.
There is a many-to-many relation using an association table called companies_paymentschedules, and a second many-to-many relation using an association table called companies_comp_paymentschedules.
Both relations serve different purposes.
companies_paymentschedules stores paymentschedules for which the company has a discount, companies_comp_paymentschedules stores paymentschedules that are linked to the company.
(I know that this could be simplified by replacing these tables with a single lookup table, but that is not an option in this legacy database.)
The problem is that I need to join both types of companies (discounted and linked) in the same query. SQLAlchemy joins both tables without problems, but it also joins the companies table, and calls them both "companies", which leads to a SQL syntax error (using MSSQL BTW).
This is the query:
q = Paymentschedules.query
# join companies if a company is specified
if company is not None:
q = q.join(Paymentschedules.companies)
q = q.join(Paymentschedules.companies_with_reduction)
The many-to-many relations are both defined in our companies model, and look like this:
paymentschedules_with_reduction = relationship("Paymentschedules", secondary=companies_paymentschedules, backref="companies_with_reduction")
paymentschedules = relationship("Paymentschedules", secondary=companies_comp_paymentschedules, backref="companies")
The problem is that the JOINS trigger SQLAlchemy to create a SQL statement that looks like this:
FROM paymentschedules
JOIN companies_comp_paymentschedules AS companies_comp_paymentschedules_1 ON paymentschedules.pmsd_id = companies_comp_paymentschedules_1.pmsd_id
JOIN companies ON companies.comp_id = companies_comp_paymentschedules_1.comp_id
JOIN companies_paymentschedules AS companies_paymentschedules_1 ON paymentschedules.pmsd_id = companies_paymentschedules_1.pmsd_id
JOIN companies ON companies.comp_id = companies_paymentschedules_1.comp_id
The two lookup tables have different names, but the related companies table is called "companies" in both cases, causing a SQL error:
[SQL Server Native Client 11.0][SQL Server]The objects "companies" and "companies" in the FROM clause have the same exposed names. Use correlation names to distinguish them. (1013) (SQLExecDirectW); ...]
I have been looking for a way to alias a join, or perhaps alias one of the relations from my lookup-tables to the companies table, but I was unable to do so.
Is there a way to alias a joined many-to-many table?
update:
Based on the suggestion by #IljaEverilä I found this: http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=onclause (see "Joins to a Target with an ON Clause") as a method to alias a joined table, but the example only shows how to alias a one-to-many type join. In my case I need to alias the other side of my lookup table, so I can't apply the example code to my situation.

How can I update one column in a right outer join style query in SQLAlchemy (PostgreSQL/Python)?

I have two tables, Table A and Table B. I have added one column to Table A, record_id. Table B has record_id and the primary ID for Table A, table_a_id. I am looking to deprecate Table B.
Relationships exist between Table B's table_a_id and Table A's id, if that helps.
Currently, my solution is:
db.execute("UPDATE table_a t
SET record_id = b.record_id
FROM table_b b
WHERE t.id = b.table_a_id")
This is my first time using this ORM -- I'd like to see if there is a way I can use my Python models and the actual functions SQLAlchemy gives me to be more 'Pythonic' rather than just dumping a Postgres statement that I know works in an execute call.
My solution ended up being as follows:
(db.query(TableA)
.filter(TableA.id == TableB.table_a_id,
TableA.record_id.is_(None))
.update({TableA.record_id: TableB.record_id}, synchronize_session=False))
This leverages the ability of PostgreSQL to do updates based on implicit references of other tables, which I did in my .filter() call (this is analogous to a WHERE in a JOIN query). The solution was deceivingly simple.

Unable to access aliased fields in SQLAlchemy query results?

Confused working with query object results. I am not using foreign keys in this example.
lookuplocation = aliased(ValuePair)
lookupoccupation = aliased(ValuePair)
persons = db.session.query(Person.lastname, lookuplocation.displaytext, lookupoccupation.displaytext).\
outerjoin(lookuplocation, Person.location == lookuplocation.valuepairid).\
outerjoin(lookupoccupation, Person.occupation1 == lookupoccupation.valuepairid).all()
Results are correct as far as data is concerned. However, when I try to access an individual row of data I have an issue:
persons[0].lastname works as I expected and returns data.
However, there is a person.displaytext in the result but since I aliased the displaytext entity, I get just one result. I understand why I get the result but I need to know what aliased field names I would use to get the two displaytext columns.
The actual SQL statement generated by the above join is as follows:
SELECT person.lastname AS person_lastname, valuepair_1.displaytext AS valuepair_1_displaytext, valuepair_2.displaytext AS valuepair_2_displaytext
FROM person LEFT OUTER JOIN valuepair AS valuepair_1 ON person.location = valuepair_1.valuepairid LEFT OUTER JOIN valuepair AS valuepair_2 ON person.occupation1 = valuepair_2.valuepairid
But none of these "as" field names are available to me in the results.
I'm new to SqlAlchemy so most likely this is a "newbie" issue.
Thanks.
Sorry - RTFM issue - should have been:
lookuplocation.displaytext.label("myfield1"),
lookupoccupation.displaytext.label("myfield2")
After results are returned reference field with person.myfield
Simple.

Filtering on Django backreferences

I am trying to query all objects in a table without backreferences from another model.
class A(models.Model):
pass
class B(models.Model):
reference = models.ForeignKey(A)
In order to get all A objects with no references from any B objects, I do
A.objects.filter(b__isnull=True)
The Django documentation on isnull does not mention backreferences at all.
Can I get into trouble with this or is it just poorly documented?
I tried this with Django 1.10.3, using your same code example.
Let's take a look at the raw SQL statement that Django creates:
>>> print(A.objects.filter(b__isnull=True).query)
SELECT "backrefs_a"."id" FROM "backrefs_a" LEFT OUTER JOIN "backrefs_b" ON ("backrefs_a"."id" = "backrefs_b"."reference_id") WHERE "backrefs_b"."id" IS NULL
Technically, this isn't the exact same SQL that Django will send to the database, but it's close enough. For more discussion, see https://stackoverflow.com/a/1074224/5044893
If we pretty it up a bit, you can see that the query is actually quite safe:
SELECT "backrefs_a"."id"
FROM "backrefs_a"
LEFT OUTER JOIN "backrefs_b" ON ("backrefs_a"."id" = "backrefs_b"."reference_id")
WHERE "backrefs_b"."id" IS NULL
The LEFT in the LEFT OUTER JOIN ensures that you will get records from A, even when there are no matching records in B. And, because Django won't save a B with a NULL id, you can rest assured that it will work as you expect.

Automatically merging arbitrary Django models

I have two Django-ORM managed databases that I'd like to merge. Both have a very similar schema, and both have the standard auth_users table, along with a few other shared tables that reference each other as well as auth_users, which I'd like to merge into a single database automatically.
Understandably, this could be very non-trivial depending upon the foreign-key relationships, and what constitutes a "unique" record in each table.
Does anyone know if there exists a tool to do this merge operation?
If nothing like this currently exists, I was considering writing my own management command, based on the standard loaddata command. Essentially, you'd use the standard dumpdata command to export tables from a source database, and then use a modified version of loaddata to "merge" them into the destination database.
For example, if I have databases A and B, and I want to merge database B into database A, then I'd want to follow a procedure according to the pseudo-code:
merge_database_dst = A
merge_database_src = B
for table in sorted(merge_database_dst.get_redundant_tables(merge_database_src), key=acyclic_dependency):
key = table.get_unique_column_key()
src_id_to_dst_id = {}
for record_src in merge_database_src.table.objects.all():
src_key_value = record_src.get_key_value(key)
try:
record_dst = merge_database_dst.table.objects.get(key)
dst_key_value = record_dst.get_key_value(key)
except merge_database_dst.table.DoesNotExist:
record_dst = merge_database_dst.table(**[(k,convert_fk(v)) for k,v in record_src._meta.fields])
record_dst.save()
dst_key_value = record_dst.get_key_value(key)
src_id_to_dst_id[(table,record_src.id)] = record_dst.id
The convert_fk() function would use the src_id_to_dst_id index to convert foreign key references in the source table to the equivalent IDs in the destination table.
To summarize, the algorithm would iterate over the table to be merged in the order of dependency, with parents iterated over first. So if we wanted to merge tables auth_users and mycustomprofile, which is dependent on auth_users, we'd iterate ['auth_users','mycustomprofile'].
Each merged table would need some sort of indicator documenting the combination of columns that denotes a universally unique record (i.e. the "key"). For auth_users, that might be the "username" and/or "email" column.
If the value of the key in database B already exists in A, then the record is not imported from B, but the ID of the existing record in A is recorded.
If the value of the key in database B does not exist in A, then the record is imported from B, and the ID of the new record is recorded.
Using the previously recorded ID, a mapping is created, explaining how to map foreign-key references to that specific record in B to the new merged/pre-existing record in A. When future records are merged into A, this mapping would be used to convert the foreign keys.
I could still envision some cases where an imported record references a table not included in the dumpdata, which might cause the entire import to fail, therefore some sort of "dryrun" option would be needed to simulate the import to ensure all FK references can be translated.
Does this seem like a practical approach? Is there a better way?
EDIT: This isn't exactly what I'm looking for, but I thought others might find it interesting. The Turbion project has a mechanism for copying changes between equivalent records in different Django models within the same database. It works by defining a translation layer (i.e. merging.ModelLayer) between two Django models, so, say if you update the "www" field in user bob#bob.com's profile, it'll automatically update the "url" field in user bob#bob.com's otherprofile.
The functionality I'm looking for is a bit different, in that I want to merge an entire (or partial) database snapshot at infrequent intervals, sort of the way the loaddata management command does.
Wow. This is going to be a complex job regardless. That said:
If I understand the needs of your project correctly, this can be something that can be done using a data migration in South. Even so, I'd be lying if I said it was going to be a joke.
My recommendation is -- and this is mostly a parrot of an assumption in your question, but I want to make it clear -- that you have one "master" table that is the base, and which has records from the other table added to it. So, table A keeps all of its existing records, and only gets additions from B. B feeds additions into A, and once done, B is deleted.
I'm hesitant to write you sample code because your actual job will be so much more complex than this, but I will anyway to try and point you in the right direction. Consider something like...
import datetime
from south.db import db
from south.v2 import DataMigration
from django.db import models
class Migration(DataMigration):
def forwards(self, orm):
for b in orm.B.objects.all():
# sanity check: does this item get copied into A at all?
if orm.A.objects.filter(username=b.username):
continue
# make an A record with the properties of my B record
a = orm.A(
first_name=b.first_name,
last_name=b.last_name,
email_address=b.email_address,
[...]
)
# save the new A record, and delete the B record
a.save()
b.delete()
def backwards(self, orm):
# backwards method, if you write one
This would end up migrating all of the Bs not in A to A, and leave you a table of Bs that are expected duplicates, which you could then check by some other means before deleting.
Like I said, this sample isn't meant to be complete. If you decide to go this route, spend time in the South documentation, and particularly make sure you look at data migrations.
That's my 2¢. Hope it helps.

Categories

Resources