Upsert / Replace with Flask, SqlAlchemy and PostgreSQL [duplicate] - python
I have a record that I want to exist in the database if it is not there, and if it is there already (primary key exists) I want the fields to be updated to the current state. This is often called an upsert.
The following incomplete code snippet demonstrates what will work, but it seems excessively clunky (especially if there were a lot more columns). What is the better/best way?
Base = declarative_base()
class Template(Base):
__tablename__ = 'templates'
id = Column(Integer, primary_key = True)
name = Column(String(80), unique = True, index = True)
template = Column(String(80), unique = True)
description = Column(String(200))
def __init__(self, Name, Template, Desc):
self.name = Name
self.template = Template
self.description = Desc
def UpsertDefaultTemplate():
sess = Session()
desired_default = Template("default", "AABBCC", "This is the default template")
try:
q = sess.query(Template).filter_by(name = desiredDefault.name)
existing_default = q.one()
except sqlalchemy.orm.exc.NoResultFound:
#default does not exist yet, so add it...
sess.add(desired_default)
else:
#default already exists. Make sure the values are what we want...
assert isinstance(existing_default, Template)
existing_default.name = desired_default.name
existing_default.template = desired_default.template
existing_default.description = desired_default.description
sess.flush()
Is there a better or less verbose way of doing this? Something like this would be great:
sess.upsert_this(desired_default, unique_key = "name")
although the unique_key kwarg is obviously unnecessary (the ORM should be able to easily figure this out) I added it just because SQLAlchemy tends to only work with the primary key. eg: I've been looking at whether Session.merge would be applicable, but this works only on primary key, which in this case is an autoincrementing id which is not terribly useful for this purpose.
A sample use case for this is simply when starting up a server application that may have upgraded its default expected data. ie: no concurrency concerns for this upsert.
SQLAlchemy supports ON CONFLICT with two methods on_conflict_do_update() and on_conflict_do_nothing().
Copying from the documentation:
from sqlalchemy.dialects.postgresql import insert
stmt = insert(my_table).values(user_email='a#b.com', data='inserted data')
stmt = stmt.on_conflict_do_update(
index_elements=[my_table.c.user_email],
index_where=my_table.c.user_email.like('%#gmail.com'),
set_=dict(data=stmt.excluded.data)
)
conn.execute(stmt)
SQLAlchemy does have a "save-or-update" behavior, which in recent versions has been built into session.add, but previously was the separate session.saveorupdate call. This is not an "upsert" but it may be good enough for your needs.
It is good that you are asking about a class with multiple unique keys; I believe this is precisely the reason there is no single correct way to do this. The primary key is also a unique key. If there were no unique constraints, only the primary key, it would be a simple enough problem: if nothing with the given ID exists, or if ID is None, create a new record; else update all other fields in the existing record with that primary key.
However, when there are additional unique constraints, there are logical issues with that simple approach. If you want to "upsert" an object, and the primary key of your object matches an existing record, but another unique column matches a different record, then what do you do? Similarly, if the primary key matches no existing record, but another unique column does match an existing record, then what? There may be a correct answer for your particular situation, but in general I would argue there is no single correct answer.
That would be the reason there is no built in "upsert" operation. The application must define what this means in each particular case.
Nowadays, SQLAlchemy provides two helpful functions on_conflict_do_nothing and on_conflict_do_update. Those functions are useful but require you to swich from the ORM interface to the lower-level one - SQLAlchemy Core.
Although those two functions make upserting using SQLAlchemy's syntax not that difficult, these functions are far from providing a complete out-of-the-box solution to upserting.
My common use case is to upsert a big chunk of rows in a single SQL query/session execution. I usually encounter two problems with upserting:
For example, higher level ORM functionalities we've gotten used to are missing. You cannot use ORM objects but instead have to provide ForeignKeys at the time of insertion.
I'm using this following function I wrote to handle both of those issues:
def upsert(session, model, rows):
table = model.__table__
stmt = postgresql.insert(table)
primary_keys = [key.name for key in inspect(table).primary_key]
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
if not update_dict:
raise ValueError("insert_or_update resulted in an empty update_dict")
stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
set_=update_dict)
seen = set()
foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
def handle_foreignkeys_constraints(row):
for c_name, c_value in foreign_keys.items():
foreign_obj = row.pop(c_value.table.name, None)
row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None
for const in unique_constraints:
unique = tuple([const,] + [row[col.name] for col in const.columns])
if unique in seen:
return None
seen.add(unique)
return row
rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
session.execute(stmt, rows)
I use a "look before you leap" approach:
# first get the object from the database if it exists
# we're guaranteed to only get one or zero results
# because we're filtering by primary key
switch_command = session.query(Switch_Command).\
filter(Switch_Command.switch_id == switch.id).\
filter(Switch_Command.command_id == command.id).first()
# If we didn't get anything, make one
if not switch_command:
switch_command = Switch_Command(switch_id=switch.id, command_id=command.id)
# update the stuff we care about
switch_command.output = 'Hooray!'
switch_command.lastseen = datetime.datetime.utcnow()
session.add(switch_command)
# This will generate either an INSERT or UPDATE
# depending on whether we have a new object or not
session.commit()
The advantage is that this is db-neutral and I think it's clear to read. The disadvantage is that there's a potential race condition in a scenario like the following:
we query the db for a switch_command and don't find one
we create a switch_command
another process or thread creates a switch_command with the same primary key as ours
we try to commit our switch_command
There are multiple answers and here comes yet another answer (YAA). Other answers are not that readable due to the metaprogramming involved. Here is an example that
Uses SQLAlchemy ORM
Shows how to create a row if there are zero rows using on_conflict_do_nothing
Shows how to update the existing row (if any) without creating a new row using on_conflict_do_update
Uses the table primary key as the constraint
A longer example in the original question what this code is related to.
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy import text
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.orm import Session
class PairState(Base):
__tablename__ = "pair_state"
# This table has 1-to-1 relationship with Pair
pair_id = sa.Column(sa.ForeignKey("pair.id"), nullable=False, primary_key=True, unique=True)
pair = orm.relationship(Pair,
backref=orm.backref("pair_state",
lazy="dynamic",
cascade="all, delete-orphan",
single_parent=True, ), )
# First raw event in data stream
first_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# Last raw event in data stream
last_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# The last hypertable entry added
last_interval_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
#staticmethod
def create_first_event_if_not_exist(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Sets the first event value if not exist yet."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, first_event_at=ts).
on_conflict_do_nothing()
)
#staticmethod
def update_last_event(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_event_at for a named pair."""
# Based on the original example of https://stackoverflow.com/a/49917004/315168
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_event_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_event_at": ts})
)
#staticmethod
def update_last_interval(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_interval_at for a named pair."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_interval_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_interval_at": ts})
)
The below works fine for me with redshift database and will also work for combined primary key constraint.
SOURCE : this
Just few modifications required for creating SQLAlchemy engine in the function
def start_engine()
from sqlalchemy import Column, Integer, Date ,Metadata
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.dialects import postgresql
Base = declarative_base()
def start_engine():
engine = create_engine(os.getenv('SQLALCHEMY_URI',
'postgresql://localhost:5432/upsert'))
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
return engine
class DigitalSpend(Base):
__tablename__ = 'digital_spend'
report_date = Column(Date, nullable=False)
day = Column(Date, nullable=False, primary_key=True)
impressions = Column(Integer)
conversions = Column(Integer)
def __repr__(self):
return str([getattr(self, c.name, None) for c in self.__table__.c])
def compile_query(query):
compiler = query.compile if not hasattr(query, 'statement') else
query.statement.compile
return compiler(dialect=postgresql.dialect())
def upsert(session, model, rows, as_of_date_col='report_date', no_update_cols=[]):
table = model.__table__
stmt = insert(table).values(rows)
update_cols = [c.name for c in table.c
if c not in list(table.primary_key.columns)
and c.name not in no_update_cols]
on_conflict_stmt = stmt.on_conflict_do_update(
index_elements=table.primary_key.columns,
set_={k: getattr(stmt.excluded, k) for k in update_cols},
index_where=(getattr(model, as_of_date_col) < getattr(stmt.excluded, as_of_date_col))
)
print(compile_query(on_conflict_stmt))
session.execute(on_conflict_stmt)
session = start_engine()
upsert(session, DigitalSpend, initial_rows, no_update_cols=['conversions'])
This allows access to the underlying models based on string names
def get_class_by_tablename(tablename):
"""Return class reference mapped to table.
https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to
:param tablename: String with name of table.
:return: Class reference or None.
"""
for c in Base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == tablename:
return c
sqla_tbl = get_class_by_tablename(table_name)
def handle_upsert(record_dict, table):
"""
handles updates when there are primary key conflicts
"""
try:
self.active_session().add(table(**record_dict))
except:
# Here we'll assume the error is caused by an integrity error
# We do this because the error classes are passed from the
# underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask
# them with it's own code - this should be updated to have
# explicit error handling for each new db engine
# <update>add explicit error handling for each db engine</update>
active_session.rollback()
# Query for conflic class, use update method to change values based on dict
c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names
c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk
c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values
c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols
c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first()
# apply new data values to the existing record
for k, v in record_dict.items()
setattr(c_target_record, k, v)
This works for me with sqlite3 and postgres. Albeit it might fail with combined primary key constraints and will most likely fail with additional unique constraints.
try:
t = self._meta.tables[data['table']]
except KeyError:
self._log.error('table "%s" unknown', data['table'])
return
try:
q = insert(t, values=data['values'])
self._log.debug(q)
self._db.execute(q)
except IntegrityError:
self._log.warning('integrity error')
where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key]
update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key}
q = update(t, values=update_dict).where(*where_clause)
self._log.debug(q)
self._db.execute(q)
except Exception as e:
self._log.error('%s: %s', t.name, e)
As we had problems with generated default-ids and references which lead to ForeignKeyViolation-Errors like
update or delete on table "..." violates foreign key constraint
Key (id)=(...) is still referenced from table "...".
we had to exclude the id for the update dict, as otherwise the it will be always generated as new default value.
In addition the method is returning the created/updated entity.
from sqlalchemy.dialects.postgresql import insert # Important to use the postgresql insert
def upsert(session, data, key_columns, model):
stmt = insert(model).values(data)
# Important to exclude the ID for update!
exclude_for_update = [model.id.name, *key_columns]
update_dict = {c.name: c for c in stmt.excluded if c.name not in exclude_for_update}
stmt = stmt.on_conflict_do_update(
index_elements=key_columns,
set_=update_dict
).returning(model)
orm_stmt = (
select(model)
.from_statement(stmt)
.execution_options(populate_existing=True)
)
return session.execute(orm_stmt).scalar()
Example:
class UpsertUser(Base):
__tablename__ = 'upsert_user'
id = Column(Id, primary_key=True, default=uuid.uuid4)
name: str = Column(sa.String, nullable=False)
user_sid: str = Column(sa.String, nullable=False, unique=True)
house_admin = relationship('UpsertHouse', back_populates='admin', uselist=False)
class UpsertHouse(Base):
__tablename__ = 'upsert_house'
id = Column(Id, primary_key=True, default=uuid.uuid4)
admin_id: Id = Column(Id, ForeignKey('upsert_user.id'), nullable=False)
admin: UpsertUser = relationship('UpsertUser', back_populates='house_admin', uselist=False)
# Usage
upserted_user = upsert(session, updated_user, [UpsertUser.user_sid.name], UpsertUser)
Note: Only tested on postgresql but could work also for other DBs which support ON DUPLICATE KEY UPDATE e.g. MySQL
In case of sqlite, the sqlite_on_conflict='REPLACE' option can be used when defining a UniqueConstraint, and sqlite_on_conflict_unique for unique constraint on a single column. Then session.add will work in a way just like upsert. See the official documentation.
I use this code for upsert
Before using this code, you should add primary keys to table in database.
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table
from sqlalchemy.inspection import inspect
from sqlalchemy.engine.reflection import Inspector
from sqlalchemy.dialects.postgresql import insert
def upsert(df, engine, table_name, schema=None, chunk_size = 1000):
metadata = MetaData(schema=schema)
metadata.bind = engine
table = Table(table_name, metadata, schema=schema, autoload=True)
# olny use common columns between df and table.
table_columns = {column.name for column in table.columns}
df_columns = set(df.columns)
intersection_columns = table_columns.intersection(df_columns)
df1 = df[intersection_columns]
records = df1.to_dict('records')
# get list of fields making up primary key
primary_keys = [key.name for key in inspect(table).primary_key]
with engine.connect() as conn:
chunks = [records[i:i + chunk_size] for i in range(0, len(records), chunk_size)]
for chunk in chunks:
stmt = insert(table).values(chunk)
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
s = stmt.on_conflict_do_update(
index_elements= primary_keys,
set_=update_dict)
conn.execute(s)
Related
How to IGNORE duplicate keys in oder to avoid errors when adding a new object to session [duplicate]
I have a record that I want to exist in the database if it is not there, and if it is there already (primary key exists) I want the fields to be updated to the current state. This is often called an upsert. The following incomplete code snippet demonstrates what will work, but it seems excessively clunky (especially if there were a lot more columns). What is the better/best way? Base = declarative_base() class Template(Base): __tablename__ = 'templates' id = Column(Integer, primary_key = True) name = Column(String(80), unique = True, index = True) template = Column(String(80), unique = True) description = Column(String(200)) def __init__(self, Name, Template, Desc): self.name = Name self.template = Template self.description = Desc def UpsertDefaultTemplate(): sess = Session() desired_default = Template("default", "AABBCC", "This is the default template") try: q = sess.query(Template).filter_by(name = desiredDefault.name) existing_default = q.one() except sqlalchemy.orm.exc.NoResultFound: #default does not exist yet, so add it... sess.add(desired_default) else: #default already exists. Make sure the values are what we want... assert isinstance(existing_default, Template) existing_default.name = desired_default.name existing_default.template = desired_default.template existing_default.description = desired_default.description sess.flush() Is there a better or less verbose way of doing this? Something like this would be great: sess.upsert_this(desired_default, unique_key = "name") although the unique_key kwarg is obviously unnecessary (the ORM should be able to easily figure this out) I added it just because SQLAlchemy tends to only work with the primary key. eg: I've been looking at whether Session.merge would be applicable, but this works only on primary key, which in this case is an autoincrementing id which is not terribly useful for this purpose. A sample use case for this is simply when starting up a server application that may have upgraded its default expected data. ie: no concurrency concerns for this upsert.
SQLAlchemy supports ON CONFLICT with two methods on_conflict_do_update() and on_conflict_do_nothing(). Copying from the documentation: from sqlalchemy.dialects.postgresql import insert stmt = insert(my_table).values(user_email='a#b.com', data='inserted data') stmt = stmt.on_conflict_do_update( index_elements=[my_table.c.user_email], index_where=my_table.c.user_email.like('%#gmail.com'), set_=dict(data=stmt.excluded.data) ) conn.execute(stmt)
SQLAlchemy does have a "save-or-update" behavior, which in recent versions has been built into session.add, but previously was the separate session.saveorupdate call. This is not an "upsert" but it may be good enough for your needs. It is good that you are asking about a class with multiple unique keys; I believe this is precisely the reason there is no single correct way to do this. The primary key is also a unique key. If there were no unique constraints, only the primary key, it would be a simple enough problem: if nothing with the given ID exists, or if ID is None, create a new record; else update all other fields in the existing record with that primary key. However, when there are additional unique constraints, there are logical issues with that simple approach. If you want to "upsert" an object, and the primary key of your object matches an existing record, but another unique column matches a different record, then what do you do? Similarly, if the primary key matches no existing record, but another unique column does match an existing record, then what? There may be a correct answer for your particular situation, but in general I would argue there is no single correct answer. That would be the reason there is no built in "upsert" operation. The application must define what this means in each particular case.
Nowadays, SQLAlchemy provides two helpful functions on_conflict_do_nothing and on_conflict_do_update. Those functions are useful but require you to swich from the ORM interface to the lower-level one - SQLAlchemy Core. Although those two functions make upserting using SQLAlchemy's syntax not that difficult, these functions are far from providing a complete out-of-the-box solution to upserting. My common use case is to upsert a big chunk of rows in a single SQL query/session execution. I usually encounter two problems with upserting: For example, higher level ORM functionalities we've gotten used to are missing. You cannot use ORM objects but instead have to provide ForeignKeys at the time of insertion. I'm using this following function I wrote to handle both of those issues: def upsert(session, model, rows): table = model.__table__ stmt = postgresql.insert(table) primary_keys = [key.name for key in inspect(table).primary_key] update_dict = {c.name: c for c in stmt.excluded if not c.primary_key} if not update_dict: raise ValueError("insert_or_update resulted in an empty update_dict") stmt = stmt.on_conflict_do_update(index_elements=primary_keys, set_=update_dict) seen = set() foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys} unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)] def handle_foreignkeys_constraints(row): for c_name, c_value in foreign_keys.items(): foreign_obj = row.pop(c_value.table.name, None) row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None for const in unique_constraints: unique = tuple([const,] + [row[col.name] for col in const.columns]) if unique in seen: return None seen.add(unique) return row rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows))) session.execute(stmt, rows)
I use a "look before you leap" approach: # first get the object from the database if it exists # we're guaranteed to only get one or zero results # because we're filtering by primary key switch_command = session.query(Switch_Command).\ filter(Switch_Command.switch_id == switch.id).\ filter(Switch_Command.command_id == command.id).first() # If we didn't get anything, make one if not switch_command: switch_command = Switch_Command(switch_id=switch.id, command_id=command.id) # update the stuff we care about switch_command.output = 'Hooray!' switch_command.lastseen = datetime.datetime.utcnow() session.add(switch_command) # This will generate either an INSERT or UPDATE # depending on whether we have a new object or not session.commit() The advantage is that this is db-neutral and I think it's clear to read. The disadvantage is that there's a potential race condition in a scenario like the following: we query the db for a switch_command and don't find one we create a switch_command another process or thread creates a switch_command with the same primary key as ours we try to commit our switch_command
There are multiple answers and here comes yet another answer (YAA). Other answers are not that readable due to the metaprogramming involved. Here is an example that Uses SQLAlchemy ORM Shows how to create a row if there are zero rows using on_conflict_do_nothing Shows how to update the existing row (if any) without creating a new row using on_conflict_do_update Uses the table primary key as the constraint A longer example in the original question what this code is related to. import sqlalchemy as sa import sqlalchemy.orm as orm from sqlalchemy import text from sqlalchemy.dialects.postgresql import insert from sqlalchemy.orm import Session class PairState(Base): __tablename__ = "pair_state" # This table has 1-to-1 relationship with Pair pair_id = sa.Column(sa.ForeignKey("pair.id"), nullable=False, primary_key=True, unique=True) pair = orm.relationship(Pair, backref=orm.backref("pair_state", lazy="dynamic", cascade="all, delete-orphan", single_parent=True, ), ) # First raw event in data stream first_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)")) # Last raw event in data stream last_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)")) # The last hypertable entry added last_interval_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)")) #staticmethod def create_first_event_if_not_exist(dbsession: Session, pair_id: int, ts: datetime.datetime): """Sets the first event value if not exist yet.""" dbsession.execute( insert(PairState). values(pair_id=pair_id, first_event_at=ts). on_conflict_do_nothing() ) #staticmethod def update_last_event(dbsession: Session, pair_id: int, ts: datetime.datetime): """Replaces the the column last_event_at for a named pair.""" # Based on the original example of https://stackoverflow.com/a/49917004/315168 dbsession.execute( insert(PairState). values(pair_id=pair_id, last_event_at=ts). on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_event_at": ts}) ) #staticmethod def update_last_interval(dbsession: Session, pair_id: int, ts: datetime.datetime): """Replaces the the column last_interval_at for a named pair.""" dbsession.execute( insert(PairState). values(pair_id=pair_id, last_interval_at=ts). on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_interval_at": ts}) )
The below works fine for me with redshift database and will also work for combined primary key constraint. SOURCE : this Just few modifications required for creating SQLAlchemy engine in the function def start_engine() from sqlalchemy import Column, Integer, Date ,Metadata from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.dialects.postgresql import insert from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker from sqlalchemy.dialects import postgresql Base = declarative_base() def start_engine(): engine = create_engine(os.getenv('SQLALCHEMY_URI', 'postgresql://localhost:5432/upsert')) connect = engine.connect() meta = MetaData(bind=engine) meta.reflect(bind=engine) return engine class DigitalSpend(Base): __tablename__ = 'digital_spend' report_date = Column(Date, nullable=False) day = Column(Date, nullable=False, primary_key=True) impressions = Column(Integer) conversions = Column(Integer) def __repr__(self): return str([getattr(self, c.name, None) for c in self.__table__.c]) def compile_query(query): compiler = query.compile if not hasattr(query, 'statement') else query.statement.compile return compiler(dialect=postgresql.dialect()) def upsert(session, model, rows, as_of_date_col='report_date', no_update_cols=[]): table = model.__table__ stmt = insert(table).values(rows) update_cols = [c.name for c in table.c if c not in list(table.primary_key.columns) and c.name not in no_update_cols] on_conflict_stmt = stmt.on_conflict_do_update( index_elements=table.primary_key.columns, set_={k: getattr(stmt.excluded, k) for k in update_cols}, index_where=(getattr(model, as_of_date_col) < getattr(stmt.excluded, as_of_date_col)) ) print(compile_query(on_conflict_stmt)) session.execute(on_conflict_stmt) session = start_engine() upsert(session, DigitalSpend, initial_rows, no_update_cols=['conversions'])
This allows access to the underlying models based on string names def get_class_by_tablename(tablename): """Return class reference mapped to table. https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to :param tablename: String with name of table. :return: Class reference or None. """ for c in Base._decl_class_registry.values(): if hasattr(c, '__tablename__') and c.__tablename__ == tablename: return c sqla_tbl = get_class_by_tablename(table_name) def handle_upsert(record_dict, table): """ handles updates when there are primary key conflicts """ try: self.active_session().add(table(**record_dict)) except: # Here we'll assume the error is caused by an integrity error # We do this because the error classes are passed from the # underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask # them with it's own code - this should be updated to have # explicit error handling for each new db engine # <update>add explicit error handling for each db engine</update> active_session.rollback() # Query for conflic class, use update method to change values based on dict c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first() # apply new data values to the existing record for k, v in record_dict.items() setattr(c_target_record, k, v)
This works for me with sqlite3 and postgres. Albeit it might fail with combined primary key constraints and will most likely fail with additional unique constraints. try: t = self._meta.tables[data['table']] except KeyError: self._log.error('table "%s" unknown', data['table']) return try: q = insert(t, values=data['values']) self._log.debug(q) self._db.execute(q) except IntegrityError: self._log.warning('integrity error') where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key] update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key} q = update(t, values=update_dict).where(*where_clause) self._log.debug(q) self._db.execute(q) except Exception as e: self._log.error('%s: %s', t.name, e)
As we had problems with generated default-ids and references which lead to ForeignKeyViolation-Errors like update or delete on table "..." violates foreign key constraint Key (id)=(...) is still referenced from table "...". we had to exclude the id for the update dict, as otherwise the it will be always generated as new default value. In addition the method is returning the created/updated entity. from sqlalchemy.dialects.postgresql import insert # Important to use the postgresql insert def upsert(session, data, key_columns, model): stmt = insert(model).values(data) # Important to exclude the ID for update! exclude_for_update = [model.id.name, *key_columns] update_dict = {c.name: c for c in stmt.excluded if c.name not in exclude_for_update} stmt = stmt.on_conflict_do_update( index_elements=key_columns, set_=update_dict ).returning(model) orm_stmt = ( select(model) .from_statement(stmt) .execution_options(populate_existing=True) ) return session.execute(orm_stmt).scalar() Example: class UpsertUser(Base): __tablename__ = 'upsert_user' id = Column(Id, primary_key=True, default=uuid.uuid4) name: str = Column(sa.String, nullable=False) user_sid: str = Column(sa.String, nullable=False, unique=True) house_admin = relationship('UpsertHouse', back_populates='admin', uselist=False) class UpsertHouse(Base): __tablename__ = 'upsert_house' id = Column(Id, primary_key=True, default=uuid.uuid4) admin_id: Id = Column(Id, ForeignKey('upsert_user.id'), nullable=False) admin: UpsertUser = relationship('UpsertUser', back_populates='house_admin', uselist=False) # Usage upserted_user = upsert(session, updated_user, [UpsertUser.user_sid.name], UpsertUser) Note: Only tested on postgresql but could work also for other DBs which support ON DUPLICATE KEY UPDATE e.g. MySQL
In case of sqlite, the sqlite_on_conflict='REPLACE' option can be used when defining a UniqueConstraint, and sqlite_on_conflict_unique for unique constraint on a single column. Then session.add will work in a way just like upsert. See the official documentation.
I use this code for upsert Before using this code, you should add primary keys to table in database. from sqlalchemy import create_engine from sqlalchemy import MetaData, Table from sqlalchemy.inspection import inspect from sqlalchemy.engine.reflection import Inspector from sqlalchemy.dialects.postgresql import insert def upsert(df, engine, table_name, schema=None, chunk_size = 1000): metadata = MetaData(schema=schema) metadata.bind = engine table = Table(table_name, metadata, schema=schema, autoload=True) # olny use common columns between df and table. table_columns = {column.name for column in table.columns} df_columns = set(df.columns) intersection_columns = table_columns.intersection(df_columns) df1 = df[intersection_columns] records = df1.to_dict('records') # get list of fields making up primary key primary_keys = [key.name for key in inspect(table).primary_key] with engine.connect() as conn: chunks = [records[i:i + chunk_size] for i in range(0, len(records), chunk_size)] for chunk in chunks: stmt = insert(table).values(chunk) update_dict = {c.name: c for c in stmt.excluded if not c.primary_key} s = stmt.on_conflict_do_update( index_elements= primary_keys, set_=update_dict) conn.execute(s)
SQLAlchemy returns primary key instead of whole record from insert statement
I need to get whole orm instance which is created in table after insert statement, since database generated UUID for primary key(database - postgresql). stmt = insert(Table).values(data).returning(Table) orm_instance = session.execute(stmt).scalar() where Table defined very simply: class Table(BaseModel): __tablename__ = "table" uuid = Column(UUID(as_uuid=True), primary_key=True) name = Column(String) # ... another fields However, statement above returns, for unknown reason, only primary key. Currently i have to execute select to continue working which is ugly. stmt = insert(Table).values(data).returning(Table) uuid = session.execute(stmt).scalar() stmt = select(Table).where(Table.uuid == uuid) orm_instance = session.execute(stmt).scalar() How to return whole instance avoiding mentioning evry single column in .returning()
Just create your new object (without PK), add it to your session, and commit() (or at least flush()). The object will pick up the automatically-generated PK value: new_table = Table(name="x") with Session(engine) as sess: print(new_table) # Table(uuid=None, name='x') sess.add(new_table) sess.commit() print(new_table) # Table(uuid=UUID('6b279e2c-9c1d-4f26-b530-add74c8f714d'), name='x')
SQLAlchemy 'entity' for `add_columns` not backed by a table
With a SQLAlchemy query like: result = db.session.query(Model).add_columns( func.min(Model.foo).over().label("min_foo"), func.max(Model.foo).over().label("max_foo"), # ... ) The result is an iterable of tuples, consisting of firstly the Model row, and then the added columns. How can I either: Contribute the added columns to Model, such that they can be accessed from each element as model.min_foo et al.; or Map the added columns into a separate dataclass, such that they can be accessed as e.g. extra.min_foo? The main thing I'm trying to achieve here is access by name - such as the given labels - without enumerating them all as model, min_foo, max_foo, ... and relying on maintaining the same order. With model, *extra, extra is just a plain list of the aggregate values, there's no reference to the label. If I dynamically add the columns to the model first: Model.min_foo = Column(Numeric) then it complains: Implicitly combining column modeltable.min_foo with column modeltable.min_foo under attribute 'min_foo'. Please configure one or more attributes for these same-named columns explicitly Apparently the solution to that is to explicitly join the tables. But this isn't one! It seems that this ought to be possible with 'mappers', but I can't find any examples that don't explicitly map to a 'table name' or its columns, which I don't really have here - it's not clear to me if/how they can be used with aggregates, or other 'virtual' columns from the query that aren't actually stored in any table.
I think that what you are looking for is a Query-time SQL expressions as mapped attributes: from sqlalchemy import create_engine, Column, Integer, select, func from sqlalchemy.orm import (Session, declarative_base, query_expression, with_expression) Base = declarative_base() class Model(Base): __tablename__ = 'model' id = Column(Integer, primary_key=True) foo = Column(Integer) foo2 = Column(Integer, default=0) engine = create_engine('sqlite:///', future=True) Base.metadata.drop_all(engine) Base.metadata.create_all(engine) with Session(engine) as session: session.add(Model(foo=10)) session.add(Model(foo=20)) session.add(Model(foo=30)) session.add(Model(foo=40)) session.add(Model(foo=50, foo2=1)) session.add(Model(foo=60, foo2=1)) session.add(Model(foo=70, foo2=1)) session.add(Model(foo=80)) session.add(Model(foo=90)) session.add(Model(foo=100)) session.commit() Model.min_foo = query_expression(func.min(Model.foo).over()) stmt = select(Model).where(Model.foo2 == 1) models = session.execute(stmt).all() for model, in models: print(model.min_foo) with Session(engine) as session: Model.max_foo = query_expression() stmt = select(Model).options(with_expression(Model.max_foo, func.max(Model.foo).over()) ).where(Model.foo2 == 1) models = session.execute(stmt).all() for model, in models: print(model.max_foo) You can define a default expression when defining the query_expression or using .options with with_expression you can define a runtime expression. The only thing is that the Mapped attribute cannot be unmapped and will return None for max_foo as there is no default expression defined.
SQLAlchemy's "post_update" behaves differently with objects that have been expunged from a session
I'm trying to copy rows from one DB instance to a another DB with an identical schema in a different environment. Two tables within this schema are linked in such a way that they result in mutually dependent rows. When these rows are inserted, the post_update runs afterward as expected, but the update statement sets the value of the ID field to None instead of the expected ID. This only happens when using objects that have been expunged from a session. When using newly created objects, the post_update behaves exactly as expected. Examples I have a relationship set up that looks like this: class Category(Base): __tablename__ = 'categories' id = Column(Integer, primary_key=True) top_product_id = Column(Integer, ForeignKey('products.id')) products = relationship('Product', primaryjoin='Product.category_id == Category.id', back_populates='category', cascade='all', lazy='selectin') top_product = relationship('Product', primaryjoin='Category.top_product_id == Product.id', post_update=True, cascade='all', lazy='selectin') class Product(Base): __tablename__ = 'products' id = Column(Integer, primary_key=True) category_id = Column(Integer, ForeignKey('categories.id')) category = relationship('Category', primaryjoin='Product.category_id == Category.id', back_populates='products', cascade='all', lazy='selectin') If I query a category and its related products from one DB and try to write them to another, the update of top_product_id doesn't behave as expected, and sets the value to None instead. The following code: category = source_session.query(Category).filter(Category.id == 99).one() source_session.expunge(category) make_transient(category) for products in category.products: make_transient(product) # this step is necessary to prevent a foreign key error on the initial category insert category.top_product_id = None dest_session.add(category) results in SQLAlchemy generating the following SQL: INSERT INTO categories (name, top_product_id) VALUES (%s, %s) ('SomeCategoryName', None) INSERT INTO products (name, category_id) VALUES (%s, %s) ('SomeProductName', 99) UPDATE categories SET top_product_id=%s WHERE categories.id = %s (None, 99) But if I use newly created objects, everything works as expected. category = Category() product = Product() category.name = 'SomeCategoryName' product.name = 'SomeProductName' product.category = category category.top_product = product dest_session.add(category) results in: INSERT INTO categories (name, top_product_id) VALUES (%s, %s) ('SomeCategoryName', None) INSERT INTO products (name, category_id) VALUES (%s, %s) ('SomeProductName', 99) UPDATE categories SET top_product_id=%s WHERE categories.id = %s (1, 99) Aside from this difference, everything behaves in the same way between these two actions. All other relationship are created properly, IDs and foreign keys are set as expected. Only the top_product_id set in the update clause created by the post_update fails to behave as expected. As an additional troubleshooting step, I tried: Creating new objects Adding them to a session Flushing the session to the DB Expunging the objects from the session Unseting the foreign key ID fields on the objects (to avoid initial insert error) and making the objects transient Re-adding the objects to the session Re-flushing to the DB On the first flush to the DB, the top_product_id is set properly. On the second, it's set to None. So this confirms that the issue is not with differences in the sessions, but something to do with expunging objects from sessions and making them transient. There must be something that does/doesn't happen during the expunge/make transient process that leaves these objects in a fundamentally different state and prevents post_update from behaving the way it should. Any ideas on where to go from here would be appreciated.
I assume your Base class mixes in the name column? Your goal is to make inspect(category).committed_state look like it does for newly created objects (except maybe for id attribute). Same for each product object. In your "newly created objects" example, category's committed_state looks like this before flushing the session: {'id': symbol('NEVER_SET'), 'name': symbol('NO_VALUE'), 'products': [], 'top_product': symbol('NEVER_SET')} while product's committed_state looks like this: {'category': symbol('NEVER_SET'), 'id': symbol('NEVER_SET'), 'name': symbol('NO_VALUE')} To get the post-update behavior, you need to both expire category.top_product_id (to prevent it from being included in the INSERT) and fudge category.top_product's committed_state (to make SQLAlchemy believe that the value has changed and therefore needs to cause an UPDATE). First, expire category.top_product_id before making category transient: source_session.expire(category, ["top_product_id"]) Then fudge category.top_product's committed_state (this can happen before or after making category transient): from sqlalchemy import inspect from sqlalchemy.orm.base import NEVER_SET inspect(category).committed_state.update(top_product=NEVER_SET) Full example: from sqlalchemy import Column, ForeignKey, Integer, String, create_engine, inspect from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import Session, make_transient, relationship from sqlalchemy.orm.base import NEVER_SET class Base(object): name = Column(String(50), nullable=False) Base = declarative_base(cls=Base) class Category(Base): __tablename__ = 'categories' id = Column(Integer, primary_key=True) top_product_id = Column(Integer, ForeignKey('products.id')) products = relationship('Product', primaryjoin='Product.category_id == Category.id', back_populates='category', cascade='all', lazy='selectin') top_product = relationship('Product', primaryjoin='Category.top_product_id == Product.id', post_update=True, cascade='all', lazy='selectin') class Product(Base): __tablename__ = 'products' id = Column(Integer, primary_key=True) category_id = Column(Integer, ForeignKey('categories.id'), nullable=False) category = relationship('Category', primaryjoin='Product.category_id == Category.id', back_populates='products', cascade='all', lazy='selectin') source_engine = create_engine('sqlite:///') dest_engine = create_engine('sqlite:///', echo=True) def fk_pragma_on_connect(dbapi_con, con_record): dbapi_con.execute('pragma foreign_keys=ON') from sqlalchemy import event for engine in [source_engine, dest_engine]: event.listen(engine, 'connect', fk_pragma_on_connect) Base.metadata.create_all(bind=source_engine) Base.metadata.create_all(bind=dest_engine) source_session = Session(bind=source_engine) dest_session = Session(bind=dest_engine) source_category = Category(id=99, name='SomeCategoryName') source_product = Product(category=source_category, id=100, name='SomeProductName') source_category.top_product = source_product source_session.add(source_category) source_session.commit() source_session.close() # If you want to test UPSERTs in dest_session. # dest_category = Category(id=99, name='PrevCategoryName') # dest_product = Product(category=dest_category, id=100, name='PrevProductName') # dest_category.top_product = dest_product # dest_session.add(dest_category) # dest_session.commit() # dest_session.close() category = source_session.query(Category).filter(Category.id == 99).one() # Ensure relationship attributes are initialized before we make objects transient. _ = category.top_product # source_session.expire(category, ['id']) # only if you want new IDs in dest_session source_session.expire(category, ['top_product_id']) for product in category.products: # Ensure relationship attributes are initialized before we make objects transient. _ = product.category # source_session.expire(product, ['id']) # only if you want new IDs in dest_session # Not strictly needed as long as Product.category is not a post-update relationship. source_session.expire(product, ['category_id']) make_transient(category) inspect(category).committed_state.update(top_product=NEVER_SET) for product in category.products: make_transient(product) # Not strictly needed as long as Product.category is not a post-update relationship. inspect(product).committed_state.update(category=NEVER_SET) dest_session.add(category) # Or, if you want UPSERT (must retain original IDs in this case) # dest_session.merge(category) dest_session.flush() Which produces this DML in dest_session: INSERT INTO categories (name, id, top_product_id) VALUES (?, ?, ?) ('SomeCategoryName', 99, None) INSERT INTO products (name, id, category_id) VALUES (?, ?, ?) ('SomeProductName', 100, 99) UPDATE categories SET top_product_id=? WHERE categories.id = ? (100, 99) It seems like make_transient should reset committed_state to be as if it were a new object, but I guess not.
Inserting record to mysql table, via sqlalchemy , with autoinc throws error
I am trying to insert a row for a table which has auto_increment as well as some foreign keys. All the foreign keys exist. But it throws error. sqlalchemy.orm.exc.FlushError: Instance Stock at 0x9cf062c has a NULL identity key. If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values. Ensure also that this flush() is not occurring at an inappropriate time, such as within a load() event. Even insertion of record via MySQL, by copy-paste SQL produced by echo=True, is executing. Stock Class class Stock(Base): __tablename__ = 'Stock' Code = Column('Code',String(8),primary_key=True) Symbol = Column('Symbol',String(128)) ListingName = Column('ListingName',String(256)) ListingDate = Column('ListingDate',DateTime()) RecordAddedDate = Column('RecordAddedDate',DateTime()) HomeCountry = Column('HomeCountry',ForeignKey('Country.Code')) PrimaryExchange = Column('PrimaryExchange',ForeignKey('Exchange.Code')) BaseCurrency = Column('BaseCurrency',ForeignKey('Currency.Code')) InstrumentType = Column('InstrumentType',ForeignKey('Instrument.InstrumentType')) Record insertion Engine = sqlalchemy.create_engine('mysql://user:pass#host/db',echo=True) Session = sqlalchemy.orm.sessionmaker(bind=Engine) SessionObj = Session() NewStock = Stock() NewStock.InstrumentType = 'Stock' NewStock.Symbol = 'MSFT' NewStock.ListingName = 'Microsoft' NewStock.HomeCountry = 'IN' NewStock.PrimaryExchange = 'NSEOI' NewStock.BaseCurrency = 'INR' NewStock.ListingDate = datetime.datetime.now().strftime("%Y%m%d") NewStock.RecordAddedDate = datetime.datetime.now().strftime("%Y%m%d") print NewStock SessionObj.add(NewStock) SessionObj.flush() print NewStock.Code
Add autoincrement=True to your column.
Got it. I had the column type as String, after converting to Integer worked fine.