thanks in advance for your help.
I have two entities, Human and Chimp. Each has a collection of metrics, which can contain subclasses of a MetricBlock, for instance CompleteBloodCount (with fields WHITE_CELLS, RED_CELLS, PLATELETS).
So my object model looks like (forgive the ASCII art):
--------- metrics --------------- ----------------------
| Human | ----------> | MetricBlock | <|-- | CompleteBloodCount |
--------- --------------- ----------------------
^
--------- metrics |
| Chimp | --------------
---------
This is implemented with the following tables:
Chimp (id, …)
Human (id, …)
MetricBlock (id, dtype)
CompleteBloodCount (id, white_cells, red_cells, platelets)
CholesterolCount (id, hdl, ldl)
ChimpToMetricBlock(chimp_id, metric_block_id)
HumanToMetricBlock(human_id, metric_block_id)
So a human knows its metric blocks, but a metric block does not know its human or chimp.
I would like to write a query in SQLAlchemy to find all CompleteBloodCounts for a particular human. In SQL I could write something like:
SELECT cbc.id
FROM complete_blood_count cbc
WHERE EXISTS (
SELECT 1
FROM human h
INNER JOIN human_to_metric_block h_to_m on h.id = h_to_m.human_id
WHERE
h_to_m.metric_block_id = cbc.id
)
I'm struggling though to write this in SQLAlchemy. I believe correlate(), any(), or an aliased join may be helpful, but the fact that a MetricBlock doesn't know its Human or Chimp is a stumbling block for me.
Does anyone have any advice on how to write this query? Alternately, are there other strategies to define the model in a way that works better with SQLAlchemy?
Thank you for your assistance.
Python 2.6
SQLAlchemy 0.7.4
Oracle 11g
Edit:
HumanToMetricBlock is defined as:
humanToMetricBlock = Table(
"human_to_metric_block",
metadata,
Column("human_id", Integer, ForeignKey("human.id"),
Column("metric_block_id", Integer, ForeginKey("metric_block.id")
)
per the manual.
Each primate should have a unique ID, regardless of what type of primate they are. I'm not sure why each set of attributes (MB, CBC, CC) are separate tables, but I assume that they have more than one dimension (primate) such as time, otherwise I would only have one giant table.
Thus, I would structure this problem in the following manner:
Create a parent object Primate and derive humans and chimps from it. This example is using single table inheritance, though you may want to use joined table inheritance based on their attributes.
class Primate(Base):
__tablename__ = 'primate'
id = Column(Integer, primary_key=True)
genus = Column(String)
...attributes all primates have...
__mapper_args__ = {'polymorphic_on': genus, 'polymorphic_identity': 'primate'}
class Chimp(Primate):
__mapper_args__ = {'polymorphic_identity': 'chimp'}
...attributes...
class Human(Primate):
__mapper_args__ = {'polymorphic_identity': 'human'}
...attributes...
class MetricBlock(Base):
id = ...
Then you create a single many-to-many table (you can use an association proxy instead):
class PrimateToMetricBlock(Base):
id = Column(Integer, primary_key=True) # primary key is needed!
primate_id = Column(Integer, ForeignKey('primate.id'))
primate = relationship('Primate') # If you care for relationships.
metricblock_id = Column(Integer, ForeignKey('metric_block.id')
metricblock = relationship('MetricBlock')
Then I would structure the query like so (note that the on clause is not necessary since SQLAlchemy can infer the relationships automatically since there's no ambiguity):
query = DBSession.query(CompleteBloodCount).\
join(PrimateToMetricBlock, PrimateToMetricBlock.metricblock_id == MetricBlock.id)
If you want to filter by primate type, join the Primate table and filter:
query = query.join(Primate, Primate.id == PrimateToMetricBlock.primate_id).\
filter(Primate.genus == 'human')
Otherwise, if you know the ID of the primate (primate_id), no additional join is necessary:
query = query.filter(PrimateToMetricBlock.primate_id == primate_id)
If you're only retrieving one object, end the query with:
return query.first()
Otherwise:
return query.all()
Forming your model like this should eliminate any confusion and actually make everything simpler. If I'm missing something, let me know.
Related
I would like to be able to get info about what types will be created during SQLAlchemy's create_all(). Yes, they can be printed if I set up echo-ing of generated SQL, but how can i print it without actually hitting database? For example, I have a model:
class MyModel(Base):
id = Column(Integer, primary_key=True)
indexed_field = Column(String(50))
enum_field = Column(Enum(MyEnum))
__table_args__ = (
Index("my_ix", indexed_field),
)
where MyEnum is:
class MyEnum(enum.Enum):
A = 0
B = 1
I can get CREATE TABLE statement and all CREATE INDEX statements like this:
from sqlalchemy.schema import CreateTable, CreateIndex
print(str(CreateTable(MyModel.__table__).compile(postgres_engine)))
for idx in MyModel.__table__.indexes:
print(str(CreateIndex(idx)).compile(postgres_engine))
Result will be something like that:
CREATE TABLE my_model (
id SERIAL NOT NULL,
indexed_field VARCHAR(50),
enum_field myenum,
PRIMARY KEY (id)
)
CREATE INDEX my_ix ON my_model (indexed_field)
Notice the line enum_field myenum. How can I get generated SQL for CREATE TYPE myenum... statement?
I've found the answer!
from sqlalchemy.dialects.postgresql.base import PGInspector
PGInspector(postgres_engine).get_enums()
It returns a list of all created enums, which IMO is even better than raw sql, documentation is here.
I have three models (note that this is done in Flask-SQLAlchemy, but if you can only write an answer for vanilla SQLAlchemy, that is fine with me.) Irrelevant fields are removed for clarity.
class KPI(db.Model):
__tablename__ = 'kpis'
id = db.Column(db.Integer, primary_key=True)
identifier = db.Column(db.String(length=50))
class Report(db.Model):
__tablename__ = 'reports'
id = db.Column(db.Integer, primary_key=True)
class ReportKPI(db.Model):
report_id = db.Column(db.Integer, db.ForeignKey('reports.id'), primary_key=True)
kpi_id = db.Column(db.Integer, db.ForeignKey('kpis.id'), primary_key=True)
report = db.relationship('Report', backref=db.backref('values'))
kpi = db.relationship('KPI')
My goal is to find all Report objects that don’t measure a specific KPI (ie. there is no ReportKPI object whose KPI relationship has identifier set to a specific value).
One of my tries look like
Report.query \
.join(ReportKPI) \
.join(KPI) \
.filter(KPI.identifier != 'reflection')
but this gives back more Report objects that actually exist (I guess I get one for every ReportKPI that has a KPI with anything but “reflection”.)
Is the thing I want to achieve actually possible with SQLAlchemy? If so, what is the magic word (pleas doesn’t seem to work…)
An EXISTS subquery expression is a good fit for your goal. A shorthand way to write such a query would be:
Report.query.\
filter(db.not_(Report.values.any(
ReportKPI.kpi.has(identifier='reflection'))))
but this produces 2 nested EXISTS expressions, though a join in an EXISTS would do as well:
Report.query.\
filter(db.not_(
ReportKPI.query.
filter_by(report_id=Report.id).
join(KPI).
filter_by(identifier='reflection').
exists()))
Finally, a LEFT JOIN with an IS NULL is an option as well:
Report.query.\
outerjoin(db.join(ReportKPI, KPI),
db.and_(ReportKPI.report_id == Report.id,
KPI.identifier == 'reflection')).\
filter(KPI.id.is_(None))
I have a, somewhat odd, query that gets me all the items in a parent table that have no matches in its corresponding child table.
If possible, id like to turn it into an SQLAlchemy query. But I have no idea how. I can do basic gets and filters, but this one is beyond my experience so far. Any help you folks might give would be greatly appreciated.
class customerTranslations(Base):
"""parent table. holds customer names"""
__tablename__ = 'customer_translation'
id = Column(Integer, primary_key=True)
class customerEmails(Base):
"""child table. hold emails for customers in translation table"""
__tablename__ = 'customer_emails'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('customer_translation.id'))
I want to build:
SELECT * FROM customer_translation
WHERE id NOT IN (SELECT parent_id FROM customer_emails)
You have a subquery, so create one first:
all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
and then you can use that to filter your other table:
translations_with_no_email = session.query(customerTranslations).filter(
~customerTranslations.id.in_(all_emails_stmnt))
This produces the same SQL (but with all the column names expanded, rather than using *, the ORM then can create your objects):
>>> all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
>>> print(all_emails_stmnt)
SELECT customer_emails.parent_id
FROM customer_emails
>>> translations_with_no_email = session.query(customerTranslations).filter(
... ~customerTranslations.id.in_(all_emails_stmnt))
>>> print(translations_with_no_email)
SELECT customer_translation.id AS customer_translation_id
FROM customer_translation
WHERE customer_translation.id NOT IN (SELECT customer_emails.parent_id
FROM customer_emails)
You could also use NOT EXISTS:
from sqlalchemy.sql import exists
has_no_email_stmnt = ~exists().where(customerTranslations.id == customerEmails.parent_id)
translations_with_no_email = session.query(customerTranslations).filter(has_no_email_stmnt)
or, if you have a a backreference on the customerTranslations class pointing to emails, named emails, use .any() on the relationship and invert:
session.query(customerTranslations).filter(
~customerTranslations.emails.any())
Back in 2010 NOT EXISTS was a little slower on MySQL but you may want to re-assess if that is still the case.
I have three tables: UserTypeMapper, User, and SystemAdmin. In my get_user method, depending on the UserTypeMapper.is_admin row, I then query either the User or SystemAdmin table. The user_id row correlates to the primary key id in the User and SystemAdmin tables.
class UserTypeMapper(Base):
__tablename__ = 'user_type_mapper'
id = Column(BigInteger, primary_key=True)
is_admin = Column(Boolean, default=False)
user_id = Column(BigInteger, nullable=False)
class SystemAdmin(Base):
__tablename__ = 'system_admin'
id = Column(BigInteger, primary_key=True)
name = Column(Unicode)
email = Column(Unicode)
class User(Base):
__tablename__ = 'user'
id = Column(BigInteger, primary_key=True)
name = Column(Unicode)
email = Column(Unicode)
I want to be able to get any user – system admin or regular user – from one query, so I do a join, on either User or SystemAdmin depending on the is_admin row. For example:
DBSession.query(UserTypeMapper, SystemAdmin).join(SystemAdmin, UserTypeMapper.user_id==SystemAdmin.id).first()
and
DBSession.query(UserTypeMapper, User).join(User, UserTypeMapper.user_id==User.id).first()
This works fine; however, I then would like to be access these, like so:
>>> my_admin_obj.is_admin
True
>>> my_admin_obj.name
Bob Smith
versus
>>> my_user_obj.is_admin
False
>>> my_user_obj.name
Bob Stevens
Currently, I have to specify: my_user_obj.UserTypeMapper.is_admin and my_user_obj.User.name. From what I've been reading, I need to map the tables so that I don't need to specify which table the attribute belongs to. My problem is that I do not understand how I can specify this given that I have two potential tables that the name attribute, for example, may come from.
This is the example I am referring to: Mapping a Class against Multiple Tables
How can I achieve this? Thank you.
You have discovered why "dual purpose foreign key", is an antipattern.
There is a related problem to this that you haven't quite pointed out; there's no way to use a foreign key constraint to enforce the data be in a valid state. You want to be sure that there's exactly one of something for each row in UserTypeMapper, but that 'something' is not any one table. formally you want a functional dependance on
user_type_mapper → (system_admin× 1) ∪ (user× 0)
But most sql databses won't allow you to write a foreign key constraint expressing that.
It looks complicated because it is complicated.
instead, lets consider what we really want to say; "every system_admin should be a user; or
system_admin → user
In sql, that would be written:
CREATE TABLE user (
id INTEGER PRIMARY KEY,
name VARCHAR,
email VARCHAR
);
CREATE TABLE system_admin (
user_id INTEGER PRIMARY KEY REFERENCES user(id)
);
Or, in sqlalchemy declarative style
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)
email = Column(String)
class SystemAdmin(Base):
__tablename__ = 'system_admin'
user_id = Column(ForeignKey(User.id), primary_key=True)
What sort of questions does this schema allow us to ask?
"Is there a SystemAdmin by the name of 'john doe'"?
>>> print session.query(User).join(SystemAdmin).filter(User.name == 'john doe').exists()
EXISTS (SELECT 1
FROM "user" JOIN system_admin ON "user".id = system_admin.user_id
WHERE "user".name = :name_1)
"How many users are there? How many sysadmins?"
>>> print session.query(func.count(User.id), func.count(SystemAdmin.user_id)).outerjoin(SystemAdmin)
SELECT count("user".id) AS count_1, count(system_admin.user_id) AS count_2
FROM "user" LEFT OUTER JOIN system_admin ON "user".id = system_admin.user_id
I hope you can see why the above is prefereable to the design you describe in your question; but in the off chance you don't have a choice (and only in that case, if you still feel what you've got is better, please refine your question), you can still cram that data into a single python object, which will be very difficult to work with, by providing an alternate mapping to the tables; specifically one which follows the rough structure in the first equation.
We need to mention UserTypeMapper twice, once for each side of the union, for that, we need to give aliases.
>>> from sqlalchemy.orm import aliased
>>> utm1 = aliased(UserTypeMapper)
>>> utm2 = aliased(UserTypeMapper)
For the union bodies join each alias to the appropriate table: Since SystemAdmin and User have the same columns in the same order, we don't need to describe them in detail, but if they are at all different, we need to make them "union compatible", by mentioning each column explicitly; this is left as an exercise.
>>> utm_sa = Query([utm1, SystemAdmin]).join(SystemAdmin, (utm1.user_id == SystemAdmin.id) & (utm1.is_admin == True))
>>> utm_u = Query([utm2, User]).join(User, (utm2.user_id == User.id) & (utm2.is_admin == False))
And then we join them together...
>>> print utm_sa.union(utm_u)
SELECT anon_1.user_type_mapper_1_id AS anon_1_user_type_mapper_1_id, anon_1.user_type_mapper_1_is_admin AS anon_1_user_type_mapper_1_is_admin, anon_1.user_type_mapper_1_user_id AS anon_1_user_type_mapper_1_user_id, anon_1.system_admin_id AS anon_1_system_admin_id, anon_1.system_admin_name AS anon_1_system_admin_name, anon_1.system_admin_email AS anon_1_system_admin_email
FROM (SELECT user_type_mapper_1.id AS user_type_mapper_1_id, user_type_mapper_1.is_admin AS user_type_mapper_1_is_admin, user_type_mapper_1.user_id AS user_type_mapper_1_user_id, system_admin.id AS system_admin_id, system_admin.name AS system_admin_name, system_admin.email AS system_admin_email
FROM user_type_mapper AS user_type_mapper_1 JOIN system_admin ON user_type_mapper_1.user_id = system_admin.id AND user_type_mapper_1.is_admin = 1 UNION SELECT user_type_mapper_2.id AS user_type_mapper_2_id, user_type_mapper_2.is_admin AS user_type_mapper_2_is_admin, user_type_mapper_2.user_id AS user_type_mapper_2_user_id, "user".id AS user_id, "user".name AS user_name, "user".email AS user_email
FROM user_type_mapper AS user_type_mapper_2 JOIN "user" ON user_type_mapper_2.user_id = "user".id AND user_type_mapper_2.is_admin = 0) AS anon_1
While it's theoretically possible to wrap this all up into a python class that looks a bit like standard sqlalchemy orm stuff, I would certainly not do that. working with non-table mappings, especially when they are more than simple joins (this is a union), is lots of work for zero payoff.
Hi I have a simple question - i have 2 tables (addresses and users - user has one address, lot of users can live at the same address)... I created a sqlalchemy mapping like this:
when I get my session and try to query something like
class Person(object):
'''
classdocs
'''
idPerson = Column("idPerson", Integer, primary_key = True)
name = Column("name", String)
surname = Column("surname", String)
idAddress = Column("idAddress", Integer, ForeignKey("pAddress.idAddress"))
idState = Column("idState", Integer, ForeignKey("pState.idState"))
Address = relationship(Address, primaryjoin=idAddress==Address.idAddress)
class Address(object):
'''
Class to represent table address object
'''
idAddress = Column("idAddress", Integer, primary_key=True)
street = Column("street", String)
number = Column("number", Integer)
postcode = Column("postcode", Integer)
country = Column("country", String)
residents = relationship("Person",order_by="desc(Person.surname, Person.name)", primaryjoin="idAddress=Person.idPerson")
self.tablePerson = sqlalchemy.Table("pPerson", self.metadata, autoload=True)
sqlalchemy.orm.mapper(Person, self.tablePerson)
self.tableAddress = sqlalchemy.Table("pAddress", self.metadata, autoload=True)
sqlalchemy.orm.mapper(Address, self.tableAddress)
myaddress = session.query(Address).get(1);
print myaddress.residents[1].name
=> I get TypeError: 'RelationshipProperty' object does not support indexing
I understand residents is there to define the relationship but how the heck can I get the list of residents that the given address is assigned to?!
Thanks
You define a relationship in a wrong place. I think you are mixing Declarative Extension with non-declarative use:
when using declarative, you define your relations in your model.
otherwise, you define them when mapping model to a table
If option-2 is what you are doing, then you need to remove both relationship definitions from the models, and add it to a mapper (only one is enought):
mapper(Address, tableAddress,
properties={'residents': relationship(Person, order_by=(desc(Person.name), desc(Person.surname)), backref="Address"),}
)
Few more things about the code above:
Relation is defined only on one side. The backref takes care about the other side.
You do not need to specify the primaryjoin (as long as you have a ForeignKey specified, and SA is able to infer the columns)
Your order_by configuration is not correct, see code above for the version which works.
You might try defining Person after Address, with a backref to Address - this will create the array element:
class Address(object):
__tablename__ = 'address_table'
idAddress = Column("idAddress", Integer, primary_key=True)
class Person(object):
idPerson = Column("idPerson", Integer, primary_key = True)
...
address_id = Column(Integer, ForeignKey('address_table.idAddress'))
address = relationship(Address, backref='residents')
Then you can query:
myaddress = session.query(Address).get(1);
for residents in myaddress.residents:
print name
Further, if you have a lot of residents at an address you can further filter using join:
resultset = session.query(Address).join(Address.residents).filter(Person.name=='Joe')
# or
resultset = session.query(Person).filter(Person.name=='Joe').join(Person.address).filter(Address.state='NY')
and resultset.first() or resultset[0] or resultset.get(...) etc...