How to handle multiple join paths properly with automap?

How to handle multiple join paths properly with automap? - python

I try to automap my existing database to classes. Between two tables I have multiple join paths and I have troubles to manage those properly with SQLAlchemy.
Here's a sample schema from my database:
CREATE SCHEMA shop;
CREATE TABLE shop.address (
id SERIAL PRIMARY KEY,
name text,
address text
)
CREATE TABLE shop.orders (
id SERIAL PRIMARY KEY,
items text,
billingaddr_id integer REFERENCES address,
shippingaddr_id integer REFERENCES address
);
I have declared relationships for those foreign keys as follows:
from sqlalchemy import create_engine
from sqlalchemy.orm import relationship, Session
from sqlalchemy.ext.automap import automap_base
engine = create_engine(
"postgresql://postgres:postgres#localhost:5432/postgres",
future=True
)
Base = automap_base()
class Order(Base):
__tablename__ = 'orders'
__table_args__ = {"schema": "shop"}
billingaddr = relationship('address', foreign_keys="Order.billingaddr_id", backref="orders_billed")
shippingaddr = relationship('address', foreign_keys="Order.shippingaddr_id", backref="orders_shipped")
Base.prepare(engine, schema='shop', reflect=True)
Address = Base.classes.address
Now when creating a new Address object (jack):
jack = Address(name='Jack', address='57815 Cheryl Unions')
I get a warning:
"SAWarning: relationship 'Order.address' will copy column address.id to column orders.shippingaddr_id, which conflicts with relationship(s): 'address.orders_shipped' (copies address.id to orders.shippingaddr_id), 'Order.shippingaddr' (copies address.id to orders.shippingaddr_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards. The 'overlaps' parameter may be used to remove this warning."
How should this be solved?
That address relationship is the one automap creates automatically and is now messing with me. I don't actually need that anymore since I have created relationships by myself. Can I somehow prevent automap from creating it by default or can I delete created unnecessary relationship? I have tried to set address = None in the class declaration but it didn't work. address relationship is still created.

Answering my own question since I got this solved (Thanks Mike from sqlalchemy mailing list).
By default sqlalchemy 1.14 uses same name for both relationships so the latter overrides the previously created. Naming of those relationships can be overridden passing custom implementations of those naming methods to the automap.base.prepare. I implemented those to give unique name for each relationship. This way I don't need to even use the explicit declarative class style.
from sqlalchemy import create_engine
from sqlalchemy.orm import relationship, sessionmaker, backref
from sqlalchemy.ext.automap import automap_base, generate_relationship
engine = create_engine(
"postgresql://postgres:postgres#localhost:5432/postgres",
future=True
)
Base = automap_base()
def name_for_scalars(base, local_cls, referred_cls, constraint):
if local_cls.__name__ == 'orders' and referred_cls.__name__ == 'address':
if constraint.name == 'orders_billingaddr_id_fkey':
return 'billingaddr'
elif constraint.name == 'orders_shippingaddr_id_fkey':
return 'shippingaddr'
return referred_cls.__name__.lower()
def name_for_collections(base, local_cls, referred_cls, constraint):
if local_cls.__name__ == 'address' and referred_cls.__name__ == 'orders':
if constraint.name == 'orders_billingaddr_id_fkey':
return 'orders_billed'
elif constraint.name == 'orders_shippingaddr_id_fkey':
return 'orders_shipped'
return referred_cls.__name__.lower() + "_collection"
Base.prepare(
autoload_with=engine,
schema='shop',
name_for_scalar_relationship=name_for_scalars,
name_for_collection_relationship=name_for_collections
)
Order = Base.classes.orders
Address = Base.classes.address

Related

How do I make SQLAlchemy set values for a foreign key by passing a related entity in the constructor?

When using SQLAlchemy I would like the foreign key fields to be filled in on the Python object when I pass in a related object. For example, assume you have network devices with ports, and assume that the device has a composite primary key in the database.
If I already have a reference to a "Device" instance and want to create a new "Port" instance linked to that device without knowing if it already exists in the database I would use the merge operation in SA. However, only setting the device attribute on the port instance is insufficient. The fields of the composite foreign key will not be propagated to the port instance and SA will be unable to determine the existence of the row in the database and unconditionally issue an INSERT statement instead of an UPDATE.
The following code examples demonstrate the issue. They should be run as one .py file so we have the same in-memory SQLite instance! They have only been split for readability.
Model Definition
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Unicode, ForeignKeyConstraint, create_engine
from sqlalchemy.orm import sessionmaker, relation
from textwrap import dedent
Base = declarative_base()
class Device(Base):
__tablename__ = 'device'
hostname = Column(Unicode, primary_key=True)
scope = Column(Unicode, primary_key=True)
poll_ip = Column(Unicode, primary_key=True)
notes = Column(Unicode)
ports = relation('Port', backref='device')
class Port(Base):
__tablename__ = 'port'
__table_args__ = (
ForeignKeyConstraint(
['hostname', 'scope', 'poll_ip'],
['device.hostname', 'device.scope', 'device.poll_ip'],
onupdate='CASCADE', ondelete='CASCADE'
),
)
hostname = Column(Unicode, primary_key=True)
scope = Column(Unicode, primary_key=True)
poll_ip = Column(Unicode, primary_key=True)
name = Column(Unicode, primary_key=True)
engine = create_engine('sqlite://', echo=True)
Base.metadata.bind = engine
Base.metadata.create_all()
Session = sessionmaker(bind=engine)
The model defines a Device class with a composite PK with three fields. The Port class references Device through a composite FK on those three columns. Device also has a relationship to Port which will use that FK.
Using the model
First, we add a new device and port. As we're using an in-memory SQLite DB, these will be the only two entries in the DB. And by inserting one device into the database we have something in the device
table that we expect to be loaded on the subsequent merge in session "sess2"
sess1 = Session()
d1 = Device(hostname='d1', scope='s1', poll_ip='pi1')
p1 = Port(device=d1, name='port1')
sess1.add(d1)
sess1.commit()
sess1.close()
Working example
This block works, but it is not written in a way I would expect it to behave. More precisely, the instance "d1" is instantiated with "hostname", "scope" and "poll_ip", and that instance is passed to the "Port" instance "p2". I would expect that "p2" would "receive" those 3 values through the foreign key. But it doesn't. I am forced to manually assign the values to "p2" before calling "merge". If the values are not assigned, SA does not find the identity and tries to run an "INSERT" query for "p2" which will conflict with the already existing instance.
sess2 = Session()
d1 = Device(hostname='d1', scope='s1', poll_ip='pi1')
p2 = Port(device=d1, name='port1')
p2.hostname=d1.hostname
p2.poll_ip=d1.poll_ip
p2.scope = d1.scope
p2 = sess2.merge(p2)
sess2.commit()
sess2.close()
Broken example (but expecting it to work)
This block shows how I would expect it to work. I would expect that assigning a value to "device" when creating the Port instance should be enough.
sess3 = Session()
d1 = Device(hostname='d1', scope='s1', poll_ip='pi1')
p2 = Port(device=d1, name='port1')
p2 = sess3.merge(p2)
sess3.commit()
sess3.close()
How can I make this last block work?

The FK of the child object isn't updated until you issue a flush() either explicitly or through a commit(). I think the reason for this is that if the parent object of a relationship is also a new instance with an auto-increment PK, SQLAlchemy needs to get the PK from the database before it can update the FK on the child object (but I stand to be corrected!).
According to the docs, a merge():
examines the primary key of the instance. If it’s present, it attempts
to locate that instance in the local identity map. If the load=True
flag is left at its default, it also checks the database for this
primary key if not located locally.
If the given instance has no primary key, or if no instance can be
found with the primary key given, a new instance is created.
As you are merging before flushing, there is incomplete PK data on your p2 instance and so this line p2 = sess3.merge(p2) returns a new Port instance with the same attribute values as the p2 you previously created, that is tracked by the session. Then, sess3.commit() finally issues the flush where the FK data is populated onto p2 and then the integrity error is raised when it tries to write to the port table. Although, inserting a sess3.flush() will only raise the integrity error earlier, not avoid it.
Something like this would work:
def existing_or_new(sess, kls, **kwargs):
inst = sess.query(kls).filter_by(**kwargs).one_or_none()
if not inst:
inst = kls(**kwargs)
return inst
id_data = dict(hostname='d1', scope='s1', poll_ip='pi1')
sess3 = Session()
d1 = Device(**id_data)
p2 = existing_or_new(sess3, Port, name='port1', **id_data)
d1.ports.append(p2)
sess3.commit()
sess3.close()
This question has more thorough examples of existing_or_new style functions for SQLAlchemy.

How to do a proper upsert using sqlalchemy on postgresql?

I would like to do an upsert using the "new" functionality added by postgresql 9.5, using sqlalchemy core. While it is implemented, I'm pretty confused by the syntax, which I can't adapt to my needs.
Here is a sample code of what I would like to be able to do :
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'test'
a_id = Column('id',Integer, primary_key=True)
a = Column("a",Integer)
engine = create_engine('postgres://name:password#localhost/test')
User().metadata.create_all(engine)
meta = MetaData(engine)
meta.reflect()
table = Table('test', meta, autoload=True)
conn = engine.connect()
from sqlalchemy.dialects.postgresql import insert as psql_insert
stmt = psql_insert(table).values({
table.c['id']: bindparam('id'),
table.c['a']: bindparam('a'),
})
stmt = stmt.on_conflict_do_update(
index_elements=[table.c['id']],
set_={'a': bindparam('a')},
)
list_of_dictionary = [{'id':1, 'a':1, }, {'id':2, 'a':2,}]
conn.execute(stmt, list_of_dictionary)
I basically want to insert a bulk of rows, and if one id is already taken, I want to update it with the value I initially wanted to insert.
However sqlalchemy throw me this error :
CompileError: bindparam() name 'a' is reserved for automatic usage in the VALUES or SET clause of this insert/update statement. Please use a name other than column name when using bindparam() with insert() or update() (for example, 'b_a').
While it is a known issue (see https://groups.google.com/forum/#!topic/sqlalchemy/VwiUlF1cz_o), I didn't found any proper answer that does not require to modify either the keys of list_of_dictionary or the name of your columns.
I want to know if there is a way of constructing stmt in a way to have a consistent behavior that does not depends on whether the keys of the variable list_of_dictionary are the name of the columns of the inserted table (my code works without error in those cases).

this does the trick for me:
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table
from sqlalchemy.dialects import postgresql
from sqlalchemy.inspection import inspect
def upsert(engine, schema, table_name, records=[]):
metadata = MetaData(schema=schema)
metadata.bind = engine
table = Table(table_name, metadata, schema=schema, autoload=True)
# get list of fields making up primary key
primary_keys = [key.name for key in inspect(table).primary_key]
# assemble base statement
stmt = postgresql.insert(table).values(records)
# define dict of non-primary keys for updating
update_dict = {
c.name: c
for c in stmt.excluded
if not c.primary_key
}
# cover case when all columns in table comprise a primary key
# in which case, upsert is identical to 'on conflict do nothing.
if update_dict == {}:
warnings.warn('no updateable columns found for table')
# we still wanna insert without errors
insert_ignore(table_name, records)
return None
# assemble new statement with 'on conflict do update' clause
update_stmt = stmt.on_conflict_do_update(
index_elements=primary_keys,
set_=update_dict,
)
# execute
with engine.connect() as conn:
result = conn.execute(update_stmt)
return result

For anyone looking for an ORM solution, the following worked for me:
def upsert(
sa_sessionmaker: Union[sessionmaker, scoped_session],
model: DeclarativeMeta,
get_values: Dict[str, Any],
update_values: Dict[str, Any],
) -> Any:
"""Upserts (updates if exists, else inserts) a SQLAlchemy model object.
Note that get_values must uniquely identify a single model object (row) for this
function to work.
Args:
sa_sessionmaker: SQLAlchemy sessionmaker to connect to the database.
model: Model declarative metadata.
get_values: Arguments used to try to retrieve an existing object.
update_values: Desired attributes for the object fetched via get_values,
or the new object if nothing was fetched.
Returns:
Model object subject to upsert.
"""
with sa_sessionmaker() as session:
instance = session.query(model).filter_by(**get_values).one_or_none()
if instance:
for attr, new_val in update_values.items():
setattr(instance, attr, new_val)
else:
create_kwargs = get_values | update_values
session.add(model(**create_kwargs))
session.commit()
instance = session.query(model).filter_by(**get_values).one_or_none()
return instance
A few remarks:
If the primary key of the object is known, using Session.merge() is likely a better alternative than the function above. In that sense, the function above assumes that the primary key is not known (and hence not part of get_values)
sa_sessionmaker is a factory for Session objects (see the docs)
model takes a SQLAlchemy declarative metadata (i.e., a "table" see the docs)
Python >= 3.9 required for the implementation above. If your environment requires a previous version of Python, replace create_kwargs = get_values | update_values with create_kwargs = {**get_values, **update_values}

SQLAlchemy not finding Postgres table connected with postgres_fdw

Please excuse any terminology typos, don't have a lot of experience with databases other than SQLite. I'm trying to replicate what I would do in SQLite where I could ATTACH a database to a second database and query across all the tables. I wasn't using SQLAlchemy with SQLite
I'm working with SQLAlchemy 1.0.13, Postgres 9.5 and Python 3.5.2 (using Anaconda) on Win7/54. I have connected two databases (on localhost) using postgres_fdw and imported a few of the tables from the secondary database. I can successfully manually query the connected table with SQL in PgAdminIII and from Python using psycopg2. With SQLAlchemy I've tried:
# Same connection string info that psycopg2 used
engine = create_engine(conn_str, echo=True)
class TestTable(Base):
__table__ = Table('test_table', Base.metadata,
autoload=True, autoload_with=engine)
# Added this when I got the error the first time
# test_id is a primary key in the secondary table
Column('test_id', Integer, primary_key=True)
and get the error:
sqlalchemy.exc.ArgumentError: Mapper Mapper|TestTable|test_table could not
assemble any primary key columns for mapped table 'test_table'
Then I tried:
insp = reflection.Inspector.from_engine(engine)
print(insp.get_table_names())
and the attached tables aren't listed (the tables from the primary database do show up). Is there a way to do what I am trying to accomplish?

In order to map a table SQLAlchemy needs there to be at least one column denoted as a primary key column. This does not mean that the column need actually be a primary key column in the eyes of the database, though it is a good idea. Depending on how you've imported the table from your foreign schema it may not have a representation of a primary key constraint, or any other constraints for that matter. You can work around this by either overriding the reflected primary key column in the Table instance (not in the mapped classes body), or better yet tell the mapper what columns comprise the candidate key:
engine = create_engine(conn_str, echo=True)
test_table = Table('test_table', Base.metadata,
autoload=True, autoload_with=engine)
class TestTable(Base):
__table__ = test_table
__mapper_args__ = {
'primary_key': (test_table.c.test_id, ) # candidate key columns
}
To inspect foreign table names use the PGInspector.get_foreign_table_names() method:
print(insp.get_foreign_table_names())

Building on sibling answer by #ilja.
When using the SQLAlchemy automap feature to automatically generate mapped classes and relationships from an existing database schema, I found that the __mapper_args__ solution didn't create the model.
This alternative method where you manually define the private key will correctly enable automap to create your model.
from sqlalchemy import Column, create_engine, Text
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.schema import Table
Base = automap_base()
engine = create_engine(conn_str, convert_unicode=True)
pk = Column('uid', Text, primary_key=True)
test_table = Table(
'test_table', Base.metadata, pk, autoload=True, autoload_with=engine
)
# Inspect postgres schema
Base.prepare(engine, reflect=True)
print(dict(Base.classes))
print(test_table)

Sqlalchemy if table does not exist

I wrote a module which is to create an empty database file
def create_database():
engine = create_engine("sqlite:///myexample.db", echo=True)
metadata = MetaData(engine)
metadata.create_all()
But in another function, I want to open myexample.db database, and create tables to it if it doesn't already have that table.
EG of the first, subsequent table I would create would be:
Table(Variable_TableName, metadata,
Column('Id', Integer, primary_key=True, nullable=False),
Column('Date', Date),
Column('Volume', Float))
(Since it is initially an empty database, it will have no tables in it, but subsequently, I can add more tables to it. Thats what i'm trying to say.)
Any suggestions?

I've managed to figure out what I intended to do. I used engine.dialect.has_table(engine, Variable_tableName) to check if the database has the table inside. IF it doesn't, then it will proceed to create a table in the database.
Sample code:
engine = create_engine("sqlite:///myexample.db") # Access the DB Engine
if not engine.dialect.has_table(engine, Variable_tableName): # If table don't exist, Create.
metadata = MetaData(engine)
# Create a table with the appropriate Columns
Table(Variable_tableName, metadata,
Column('Id', Integer, primary_key=True, nullable=False),
Column('Date', Date), Column('Country', String),
Column('Brand', String), Column('Price', Float),
# Implement the creation
metadata.create_all()
This seems to be giving me what i'm looking for.

Note that in 'Base.metadata' documentation it states about create_all:
Conditional by default, will not attempt to recreate tables already
present in the target database.
And if you can see that create_all takes these arguments: create_all(self, bind=None, tables=None, checkfirst=True), and according to documentation:
Defaults to True, don't issue CREATEs for tables already present in
the target database.
So if I understand your question correctly, you can just skip the condition.

The accepted answer prints a warning that engine.dialect.has_table() is only for internal use and not part of the public API. The message suggests this as an alternative, which works for me:
import os
import sqlalchemy
# Set up a connection to a SQLite3 DB
test_db = os.getcwd() + "/test.sqlite"
db_connection_string = "sqlite:///" + test_db
engine = create_engine(db_connection_string)
# The recommended way to check for existence
sqlalchemy.inspect(engine).has_table("BOOKS")
See also the SQL Alchemy docs.

For those who define the table first in some models.table file, among other tables.
This is a code snippet for finding the class that represents the table we want to create ( so later we can use the same code to just query it )
But together with the if written above, I still run the code with checkfirst=True
ORMTable.__table__.create(bind=engine, checkfirst=True)
models.table
class TableA(Base):
class TableB(Base):
class NewTableC(Base):
id = Column('id', Text)
name = Column('name', Text)
form
Then in the form action file:
engine = create_engine("sqlite:///myexample.db")
if not engine.dialect.has_table(engine, table_name):
# Added to models.tables the new table I needed ( format Table as written above )
table_models = importlib.import_module('models.tables')
# Grab the class that represents the new table
# table_name = 'NewTableC'
ORMTable = getattr(table_models, table_name)
# checkfirst=True to make sure it doesn't exists
ORMTable.__table__.create(bind=engine, checkfirst=True)

engine.dialect.has_table does not work for me on cx_oracle.
I am getting AttributeError: 'OracleDialect_cx_oracle' object has no attribute 'default_schema_name'
I wrote a workaround function:
from sqlalchemy.engine.base import Engine
def orcl_tab_or_view_exists(in_engine: Engine, in_object: str, in_object_name: str,)-> bool:
"""Checks if Oracle table exists in current in_engine connection
in_object: 'table' | 'view'
in_object_name: table_name | view_name
"""
obj_query = """SELECT {o}_name FROM all_{o}s WHERE owner = SYS_CONTEXT ('userenv', 'current_schema') AND {o}_name = '{on}'
""".format(o=in_object, on=in_object_name.upper())
with in_engine.connect() as connection:
result = connection.execute(obj_query)
return len(list(result)) > 0

This is the code working for me to create all tables of all model classes defined with Base class
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
class YourTable(Base):
__tablename__ = 'your_table'
id = Column(Integer, primary_key = True)
DB_URL="mysql+mysqldb://<user>:<password>#<host>:<port>/<db_name>"
scoped_engine = create_engine(DB_URL)
Base = declarative_base()
Base.metadata.create_all(scoped_engine)

SQLAlchemy alembic AmbiguousForeignKeysError for declarative type but not for equivalent non-declarative type

I have the following alembic migration:
revision = '535f7a49839'
down_revision = '46c675c68f4'
from alembic import op
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from datetime import datetime
Session = sessionmaker()
Base = declarative_base()
metadata = sa.MetaData()
# This table definition works
organisations = sa.Table(
'organisations',
metadata,
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('creator_id', sa.Integer),
sa.Column('creator_staff_member_id', sa.Integer),
)
"""
# This doesn't...
class organisations(Base):
__tablename__ = 'organisations'
id = sa.Column(sa.Integer, primary_key=True)
creator_id = sa.Column(sa.Integer)
creator_staff_member_id = sa.Column(sa.Integer)
"""
def upgrade():
bind = op.get_bind()
session = Session(bind=bind)
session._model_changes = {} # if you are using Flask-SQLAlchemy, this works around a bug
print(session.query(organisations).all())
raise Exception("don't succeed")
def downgrade():
pass
Now the query session.query(organisations).all() works when I use the imperatively-defined table (the one not commented out). But if I use the declarative version, which as far as I understand should be equivalent, I get an error:
sqlalchemy.exc.AmbiguousForeignKeysError: Could not determine join
condition between parent/child tables on relationship
StaffMember.organisation - there are multiple foreign key paths
linking the tables. Specify the 'foreign_keys' argument, providing a
list of those columns which should be counted as containing a foreign
key reference to the parent table.
Now I understand what this error means: I have two foreign keys from organisations to staff_members in my actual models. But why does alembic care about these, and how does it even know they exist? How does this migration know that something called StaffMember exists? As far as I understand, alembic should only know about the models you explicitly tell it about in the migration.

Turns out the problem was with my Flask-script setup I was using to call alembic. The command I was using to call alembic was importing the code to initialise my Flask app which was itself importing my actual models.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to handle multiple join paths properly with automap? - python

Related

How do I make SQLAlchemy set values for a foreign key by passing a related entity in the constructor?

How to do a proper upsert using sqlalchemy on postgresql?

SQLAlchemy not finding Postgres table connected with postgres_fdw

Sqlalchemy if table does not exist

SQLAlchemy alembic AmbiguousForeignKeysError for declarative type but not for equivalent non-declarative type

Categories

Resources