I am wanting to map a class object to a table that is a join between two tables, and all the columns from one table and only one column from the joined table being selected (mapped).
join_table = join(table1, table2, tabl1.c.description==table2.c.description)
model_table_join= select([table1, table2.c.description]).select_from(join_table).alias()
Am I doing this right?
If all you want to do is pull in one extra column from a JOIN, I'd not muck about with an arbitrary select mapping. As the documentation points out:
The practice of mapping to arbitrary SELECT statements, especially complex ones as above, is almost never needed; it necessarily tends to produce complex queries which are often less efficient than that which would be produced by direct query construction. The practice is to some degree based on the very early history of SQLAlchemy where the mapper() construct was meant to represent the primary querying interface; in modern usage, the Query object can be used to construct virtually any SELECT statement, including complex composites, and should be favored over the “map-to-selectable” approach.
You'd just either select that extra column in your application:
session.query(Table1Model, Table2Model.description).join(Table2Model)
or you can register a relationship on the Table1Model and an association property that always pulls in the extra column:
class Table1Model(Base):
# ...
_table2 = relationship('Table2Model', lazy='join')
description = association_proxy('_table2', 'description')
The association property manages the Table2Model.description column of the joined row as you interact with it on Table1Model instances.
That said, if you must stick with a join() query as the base, then you could just exclude the extra, duplicated columns from the join, with a exclude_properties mapper argument:
join_table = join(table1, table2, table1.c.description == table2.c.description)
class JoinedTableModel(Base):
__table__ = join_table
__mapper_args__ = {
'exclude_properties' : [table1.c.description]
}
The new model then uses all the columns from the join to create attributes with the same names, except for those listed in `exclude_properties.
Or you can keep using duplicated column names in the model simply by giving them a new name:
join_table = join(table1, table2, table1.c.description == table2.c.description)
class JoinedTableModel(Base):
__table__ = join_table
table1_description = table1.c.description
You can rename any column from the join this way, at which point they will no longer conflict with other columns with the same base name from the other table.
Related
I am working with a database that does not have relationships created between tables, and changing schema is not an option for me.
I'm trying to describe in orm how to join two tables without describing Foregin keys. To make make things worst I need a custom ON clause in my SQL
Here is my ORM(more or less):
class Table1(Base):
__tablename__ = "table1"
id1 = Column(String)
id2 = Column(String)
class Table2(Base):
__tablename__ = "table2"
id1 = Column(String)
id2 = Column(String)
Goal
What I'm trying to create is relationship that joins tables like this:
.....
FROM Table1
JOIN Table2 ON (Table1.id1 = Table2.id1 OR Table1.id2 = Table2.id2)
My Attempt
I tried adding following Table1 but documentation does not explain how is this wrong in terms I can understand:
table2 = relationship("Table2",
primaryjoin=or_(foreign(id1) == remote(Table2.id1),
foreign(id2) == remote(Table2.id2)))
But when tested this I got wrong SQL query back(I expected to see in SQL the join I described above):
str(query(Table1,Table2))
SELECT "table1".id1, "table1".id2, "table2".id1, "table2".id2
FROM "table1","table2"
Note
I don't really undersatnd what remote and foregin do but I tried to infer from documentation where do they belong, without then I would get error on import saying:
ArgumentError: Could not locate any relevant foreign key columns for primary join condition 'my full primaryjoin code' on relationship Table1.other_table. Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or are annotated in the join condition with the foreign() annotation.
I don't think that I can use ForeignKey or ForeignKeyContraint because none of my colums are constraned to other table's values.
The expression
str(query(Table1,Table2))
produces a cross join between the 2 tables, as you've observed. This is the expected behaviour. If you want to use inner joins etc., you'll have to be explicit about it:
str(query(Table1, Table2).join(Table1.table2))
This joins along the relationship attribute table2. The attribute indicates how this join should happen.
Documentation on foreign() and remote() is a bit scattered to my own taste as well, but it is established in "Adjacency List Relationships" and "Non-relational Comparisons / Materialized Path" that when foreign and remote annotations are on different sides of the expression (in the ON clause), the relationship is considered to be many-to-one. When they are on the same side or remote is omitted it is considered one-to-many. So your relationship is considered to be many-to-one.
They are just an alternative to foreign_keys and remote_side parameters.
From: StartTable.objects.annotate(name=F('object_type_2__destination_table__name'))
Django writes a query containing this automatically:
LEFT OUTER JOIN "object" T4 ON ("start_table"."object_type_2_id" = T4."id")
LEFT OUTER JOIN "destination_table" ON (T4."id" = "destination_table"."object_id")
Is there a way to have Django make this more efficient by writing this instead?:
JOIN destination_table ON destination_table.object_id = start_table.object_type_2_id
Some context to keep in mind; the start_table has several foreign key fields that all refer to the same object table, but for different reasons, which is why I've given object_type_2_id as the column name.
In a legacy database we have, there is a pretty special datastructure where we have two many-to-many relations joining the same two tables companies to paymentschedules.
There is a many-to-many relation using an association table called companies_paymentschedules, and a second many-to-many relation using an association table called companies_comp_paymentschedules.
Both relations serve different purposes.
companies_paymentschedules stores paymentschedules for which the company has a discount, companies_comp_paymentschedules stores paymentschedules that are linked to the company.
(I know that this could be simplified by replacing these tables with a single lookup table, but that is not an option in this legacy database.)
The problem is that I need to join both types of companies (discounted and linked) in the same query. SQLAlchemy joins both tables without problems, but it also joins the companies table, and calls them both "companies", which leads to a SQL syntax error (using MSSQL BTW).
This is the query:
q = Paymentschedules.query
# join companies if a company is specified
if company is not None:
q = q.join(Paymentschedules.companies)
q = q.join(Paymentschedules.companies_with_reduction)
The many-to-many relations are both defined in our companies model, and look like this:
paymentschedules_with_reduction = relationship("Paymentschedules", secondary=companies_paymentschedules, backref="companies_with_reduction")
paymentschedules = relationship("Paymentschedules", secondary=companies_comp_paymentschedules, backref="companies")
The problem is that the JOINS trigger SQLAlchemy to create a SQL statement that looks like this:
FROM paymentschedules
JOIN companies_comp_paymentschedules AS companies_comp_paymentschedules_1 ON paymentschedules.pmsd_id = companies_comp_paymentschedules_1.pmsd_id
JOIN companies ON companies.comp_id = companies_comp_paymentschedules_1.comp_id
JOIN companies_paymentschedules AS companies_paymentschedules_1 ON paymentschedules.pmsd_id = companies_paymentschedules_1.pmsd_id
JOIN companies ON companies.comp_id = companies_paymentschedules_1.comp_id
The two lookup tables have different names, but the related companies table is called "companies" in both cases, causing a SQL error:
[SQL Server Native Client 11.0][SQL Server]The objects "companies" and "companies" in the FROM clause have the same exposed names. Use correlation names to distinguish them. (1013) (SQLExecDirectW); ...]
I have been looking for a way to alias a join, or perhaps alias one of the relations from my lookup-tables to the companies table, but I was unable to do so.
Is there a way to alias a joined many-to-many table?
update:
Based on the suggestion by #IljaEverilä I found this: http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=onclause (see "Joins to a Target with an ON Clause") as a method to alias a joined table, but the example only shows how to alias a one-to-many type join. In my case I need to alias the other side of my lookup table, so I can't apply the example code to my situation.
I'm trying to extract information from a number of denormalized tables, using Django models. The tables are pre-existing, part of a legacy MySQL database.
Schema description
Let's say that each table describes traits about a person, and each person has a name (this essentially identifies the person, but does not correspond to some unifying "Person" table). For example:
class JobInfo(models.Model):
name = models.CharField(primary_key=True, db_column='name')
startdate = models.DateField(db_column='startdate')
...
class Hobbies(models.Model):
name = models.CharField(primary_key=True, db_column='name')
exercise = models.CharField(db_column='exercise')
...
class Clothing(model.Model):
name = models.CharField(primary_key=True, db_column='name')
shoes = models.CharField(db_column='shoes')
...
# Twenty more classes exist, all of the same format
Accessing via SQL
In raw SQL, when I want to access information across all tables, I do a series of ugly OUTER JOINs, refining it with a WHERE clause.
SELECT JobInfo.startdate, JobInfo.employer, JobInfo.salary,
Hobbies.exercise, Hobbies.fun,
Clothing.shoes, Clothing.shirt, Clothing,pants
...
FROM JobInfo
LEFT OUTER JOIN Hobbies ON Hobbies.name = JobInfo.name
LEFT OUTER JOIN Clothing ON Clothing.name = JobInfo.name
...
WHERE
Clothing.shoes REXEGP "Nike" AND
Hobbies.exercise REGEXP "out"
...;
Model-based approach
I'm trying to convert this to a Django-based approach, where I can easily get a QuerySet that pulls in information from all tables.
I've looked into using a OneToOneField (example), making one table have a field for tying it to each of the others. However, this would mean that one table needs the "central" table, which all others reference in reverse. This seems like a mess with twenty-odd fields, and doesn't really make schematic sense (is "job info" the core properties? clothes?).
I feel like I'm going about this the wrong way. How should I be building a QuerySet on related tables, where each table has one primary key field common across all tables?
If your DB access allows this, I would probably do this by defining a Person model, then declare the name DB column to be a foreign key to that model with to_field set as the name on the person model. Then you can use the usual __ syntax in your queries.
Assuming Django doesn't complain about a ForeignKey field with primary_key=True, anyway.
class Person(models.Model):
name = models.CharField(primary_key=True, max_length=...)
class JobInfo(models.Model):
person = models.ForeignKey(Person, primary_key=True, db_column='name', to_field='name')
startdate = models.DateField(db_column='startdate')
...
I don't think to_field is actually required as long as name is declared as your primary key, but I think it's good for clarity. Or if you don't declare name as the PK on person.
I haven't tested this, though.
To use a view, you have two options. I think both would do best with an actual table containing all the known user names, maybe with a numeric PK as Django usually expects as well. Let's assume that table exists - call it person.
One option is to create a single large view to encompass all information about a user, similar to the big join you use above - something like:
create or replace view person_info as
select person.id, person.name,
jobinfo.startdate, jobinfo.employer, jobinfo.salary,
hobbies.exercise, hobbies.fun,
clothing.shoes, ...
from person
left outer join hobbies on hobbies.name = person.name
left outer join jobinfo on jobinfo.name = person.name
left outer join clothing on clothing.name = person.name
;
That might take a little debugging, but the idea should be clear.
Then declare your model with db_table = person_info and managed = False in the Meta class.
A second option would be to declare a view for each subsidiary table that includes the person_id value matching the name, then just use Django FKs.
create or replace view jobinfo_by_person as
select person.id as person_id, jobinfo.*
from person inner join jobinfo on jobinfo.name = person.name;
create or replace view hobbies_by_person as
select person.id as person_id, hobbies.*
from person inner join hobbies on hobbies.name = person.name;
etc. Again, I'm not totally sure the .* syntax will work - if not, you'd have to list all the fields you're interested in. And check what the column names from the subsidiary tables are.
Then point your models at the by_person versions and use the standard FK setup.
This is a little inelegant and I make no claims for good performance, but it does let you avoid further denormalizing your database.
Lets say I got 2 models, Document and Person. Document got relationship to Person via "owner" property. Now:
session.query(Document)\
.options(joinedload('owner'))\
.filter(Person.is_deleted!=True)
Will double join table Person. One person table will be selected, and the doubled one will be filtered which is not exactly what I want cuz this way document rows will not be filtered.
What can I do to apply filter on joinloaded table/model ?
You are right, table Person will be used twice in the resulting SQL, but each of them serves different purpose:
one is to filter the the condition: filter(Person.is_deleted != True)
the other is to eager load the relationship: options(joinedload('owner'))
But the reason your query returns wrong results is because your filter condition is not complete. In order to make it produce the right results, you also need to JOIN the two models:
qry = (session.query(Document).
join(Document.owner). # THIS IS IMPORTANT
options(joinedload(Document.owner)).
filter(Person.is_deleted != True)
)
This will return correct rows, even though it will still have 2 references (JOINs) to Person table. The real solution to your query is that using contains_eager instead of joinedload:
qry = (session.query(Document).
join(Document.owner). # THIS IS STILL IMPORTANT
options(contains_eager(Document.owner)).
filter(Person.is_deleted != True)
)