sqlalchemy integrity error, unique constraint failed owner.owner_id - python

I am in the process of bulk adding objects to my database on the condition that it
a) doesn't already exist in my database
b) isn't already in the process of being created (because the data i'm working with could have duplicate results given to me)
Previously, to handle step b, I was storing the "to-be-created" objects in a list, and iterating through it to check if there was a matching object, but I stumbled upon sets and figured that I could just run the `.add` method on the set and know that the collection was being deduped for me.
I'm testing with a fresh database, so I know my issue exists in the process of step b.
My code looks something like
new_owners = set()
for item in items:
owner = Owner.find_owner_by_id(item['owner']['id'])
if owner is None:
owner = Owner(owner_id=item['owner']['id'], display_name=item['owner']['display_name'])
new_owners.add(owner)
# print to check deduped set of owners
for owner in new_owners:
print(f'{owner.display_name} | {owner.owner_id}')
db.session.add_all(new_owners)
db.session.commit()
Owner.py
#dataclass()
class Owner(db.Model):
__tablename__ = 'owner'
id = Column(Integer, primary_key=True)
owner_id = Column(String(40), unique=True)
display_name = Column(String(128), nullable=False)
def __eq__(self, other):
return self.owner_id == other.owner_id
def __hash__(self):
return hash(self.owner_id)
I'm not sure what I am missing at this point, because my `print` check before adding the objects to the database session doesn't show any duplicate objects, but somehow I still get this unique constraint error.
[SQL: INSERT INTO owner (owner_id, display_name) VALUES (?, ?)]
[parameters: ('yle42kxojqswhkwj77bb34g7x', 'RealBrhaka')]
Which would only happen if this object was in the given data more than once, but I would expect the set to handle deduping this

Related

SQL Alchemy: query(<Table>).filter(<Table.element.match(arg)>) does not work properly

So I have this filter statement:
x = self.session.query(URL).filter(URL.long_url.match(longurl))
URL Table
from base import Base
from sqlalchemy import Column, String, Boolean, Integer
class URL(Base):
__tablename__ = "urls"
short_url = Column(String(256), primary_key=True, unique=True, nullable=False)
long_url = Column(String(256), nullable=False, unique=True)
time = Column(String(256), nullable=False)
def __init__(self, short_url, long_url, time):
self.short_url = short_url
self.long_url = long_url
self.time = time
I am just trying to run to find a table row where the match is true.
When the above statement is run I should get a URL object for x.
Instead, I get a Query object indicating that my filter could not find a row with the matching long_url.
What is wrong with my statement for it to be returning a Query object?
As the comment from #snakecharmerb mentioned, the statement needs to be executed for it returns something more than just a query object.
Adding .all() or .first() executes the query and returns either a list of matching objects or just the first matching object, respectively.
But even doing that will not be enough to actually make this statement run without errors, as the current statement throws an Operational Error.
For my purposes, just replacing the .match() with a == or is worked perfectly.
So the new statement became: x = self.session.query(URL).filter(URL.long_url == longurl).first()

Why I receive sqlalchemy.exc.IntegrityError

I faced with error, the sense of witch I can't understand:
I write to models:
class User(db.Model):
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(255))
class UserFile(db.Model):
__tablename__ = 'user_files'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(255))
author_id = db.Column(db.Integer, db.ForeignKey('users.id'), nullable=False)
author = db.relationship(User, foreign_keys=[author_id])
I need to do a number of additional steps when I delete a UserFile instance.
When a UserFile instance is deleted directly, I can do whatever I need to do. There is a problem when the User instance is deleted. In this case, I need to remove all UserFile instances associated with the User. But I can't use cascade deletion, because I need to perform additional actions for each UserFile.
I tried using SQLAlchemy 'before_delete' event, but I got an error because it was already running after deletion, although it was called 'before'. I saw this by adding output of the message to the console and not seeing this message in the console until I got the error.
Then I tried using FLASK-sqlalchemy signals. I did:
from flask_sqlalchemy import before_models_committed
#before_models_committed.connect_via(app)
def delete_all_user_folders_after_delete(sender, changes):
for obj, operation in changes:
if isinstance(obj, User) and operation == 'delete':
print('files: ', UserFile.query.filter_by(author_id=obj.id, parent_id=None).all())
for item in UserFile.query.filter_by(author_id=obj.id,
parent_id=None).all():
print(item)
delete_file(item, True)
And got error on line:
print ('files: ', UserFile.query.filter_by(author_id=obj.id, parent_id=None).all())
What is the cause of this error and how do I properly pre-delete all Userfiles before deleting a User?
Error description:
sqlalchemy.exc.IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.IntegrityError) update or delete on table "users" violates foreign key constraint "user_files_author_id_fkey" on table "user_files"
DETAIL: Key (id)=(2) is still referenced from table "user_files".
The query performed in delete_all_user_folders_after_delete() causes autoflush, which flushes the deletions prematurely, before your manual cleanup has been done. The default referential action in PostgreSQL is NO ACTION, which "means that if any referencing rows still exist when the constraint is checked, an error is raised". It would seem that you have not deferred the constraint in question, so it is checked immediately.
You could perhaps try the solution proposed in the error message:
#before_models_committed.connect_via(app)
def delete_all_user_folders_after_delete(sender, changes):
for obj, operation in changes:
if isinstance(obj, User) and operation == 'delete':
with db.session.no_autoflush:
fs = UserFile.query.filter_by(author_id=obj.id, parent_id=None).all()
print('files: ', fs)
for item in fs:
print(item)
# If this queries the DB, remember to disable autoflush again
delete_file(item, True)

Copying data from one sqlalchemy session to another

I have a sqlalchemy schema containing three tables, (A, B, and C) related via one-to-many Foreign Key relationships (between A->B) and (B->C) with SQLite as a backend. I create separate database files to store data, each of which use the exact same sqlalchemy Models and run identical code to put data into them.
I want to be able to copy data from all these individual databases and put them into a single new database file, while preserving the Foreign Key relationships. I tried the following code to copy data from one file to a new file:
import sqlalchemy
from sqlalchemy.ext import declarative
from sqlalchemy import Column, String, Integer
from sqlalchemy import orm, engine
Base = declarative.declarative_base()
Session = orm.session_maker()
class A(Base):
__tablename__ = 'A'
a_id = Column(Ingeter, primary_key=True)
adata = Column(String)
b = orm.relationship('B', back_populates='a', cascade='all, delete-orphan', passive_deletes=True)
class B(Base):
__tablename__ = 'B'
b_id = Column(Ingeter, primary_key=True)
a_id = Column(Integer, sqlalchemy.ForeignKey('A.a_id', ondelete='SET NULL')
bdata = Column(String)
a = orm.relationship('A', back_populates='b')
c = orm.relationship('C', back_populates='b', cascade='all, delete-orphan', passive_deletes=True)
class C(Base):
__tablename__ = 'C'
c_id = Column(Ingeter, primary_key=True)
b_id = Column(Integer, sqlalchemy.ForeignKey('B.b_id', ondelete='SET NULL')
cdata = Column(String)
b = orm.relationship('B', back_populates='c')
file_new = 'file_new.db'
resource_new = 'sqlite:////%s' % file_new.lstrip('/')
engine_new = sqlalchemy.create_engine(resource_new, echo=False)
session_new = Session(bind=engine_new)
file_old = 'file_old.db'
resource_old = 'sqlite:////%s' % file_old.lstrip('/')
engine_old = sqlalchemy.create_engine(resource_old, echo=False)
session_old = Session(bind=engine_old)
for arow in session_old.query(A):
session_new.add(arow) # I am assuming that this will somehow know to copy all the child rows from the tables B and C due to the Foreign Key.
When run, I get the error, "Object '' is already attached to session '2' (this is '1')". Any pointers on how to do this using sqlalchemy and sessions? I also want to preserve the Foreign Key relationships within each database.
The use case is where data is first generated locally in non-networked machines and aggregated into a central db on the cloud. While the data will get generated in SQLite, the merge might happen in MySQL or Postgres, although here everything is happening in SQLite for simplicity.
First, the reason you get that error is because the instance arow is still tracked by session_old, so session_new will refuse to deal with it. You can detach it from session_old:
session_old.expunge(arow)
Which will allow you do add arow to session_new without issue, but you'll notice that nothing gets inserted into file_new. This is because SQLAlchemy knows that arow is persistent (meaning there's a row in the db corresponding to this object), and when you detach it and add it to session_new, SQLAlchemy still thinks it's persistent, so it does not get inserted again.
This is where Session.merge comes in. One caveat is that it won't merge unloaded relationships, so you'll need to eager load all the relationships you want to merge:
query = session_old.query(A).options(orm.subqueryload(A.b),
orm.subqueryload(A.b, B.c))
for arow in query:
session_new.merge(arow)

Generic solution to delete records which not used as a foreign key

Generic solution to delete records which not used as a foreign key
Here are the presets, there are several models: 'ParentA', 'ParentB','ChildAA', 'ChildBA' and so on.
the relationship between ParentX and ChildXY is the ChildXY has a foreign key of ParentX,
for example:
#this is ParentA
class ParentA(Base):
__tablename__ = 'parenta'
id = Column(Integer, primary_key=True)
name = Column(String(12))
need_delete = Column(Integer)
children = relationship("ChildAA",
back_populates="parent")
#this is ChildAA
class ChildAA(Base):
__tablename__ = 'childaa'
name = Column(String(12))
id = Column(Integer, primary_key=True)
need_delete = Column(Integer)
parenta_id = Column(Integer, ForeignKey('parenta.id'))
parenta = relationship("ParentA")
#this is ParentB
........
And I wanna delete all the records(all the childx, parentx included) whose attribute 'need_delete' is 1 and record itself havn't been used as a foreign key by child table. I found a direct but complicated way:
I can firstly go through all the childx tables and safely removd records, and then to the parentx tables and delete records with the
code block one by one:
#deletion is for ParentA
for parent in session.query(ParentA).join(ParentA.children).group_by(ParentA).having(func.count(ChildAA.id) == 0):
if parent.need_delete == 1
session.delete(parent)
#deletion is for ParentB
......
#deletion is for ParentC
.....
session.commit()
And this is hard coded, Is there any generic way to delete records which is used as a foreign key at the present?
You could use NOT EXISTS, an antijoin, to query those parents which have no children and need delete:
from sqlalchemy import inspect
# After you've cleaned up the child tables:
# (Replace the triple dot with the rest of your parent types)
for parent_type in [ParentA, ParentB, ...]:
# Query for `parent_type` rows that need delete
q = session.query(parent_type).filter(parent_type.need_delete == 1)
# Go through all the relationships
for rel in inspect(parent_type).relationships:
# Add a NOT EXISTS(...) to the query predicates (the antijoin)
q = q.filter(~getattr(parent_type, rel.key).any())
# Issue a bulk delete. Replace `False` with 'fetch',
# if you do need to synchronize the deletions with the ongoing
# SQLA session. In your example you commit after the deletions,
# which expires instances in session, so no synchronization is
# required.
q.delete(synchronize_session=False)
...
session.commit()
Instead of first querying all the instances to the session and marking for deletion one by one use a bulk delete.
Do note that you must be explicit about your relationships and the parent side must be defined. If you have foreign keys referring to parent tables not defined as an SQLAlchemy relationship on the parent you'll probably get unwanted deletions of children (depends on how the foreign key constraints have been configured).
Another approach could be to configure your foreign key constraints to restrict deletions and handle the raised errors in a subtransaction (savepoint), but I suppose you've already set your schema up and that'd require altering the existing foreign key constraints.

sqlalchemy one-to-many ORM update error

I have two tables: Eca_users and Eca_user_emails, one user can have many emails. I recive json with users and their emails. And I wont to load them into MS SQL database. Users can update their emails, so in this json I can get the same users with new (or changed) emails.
My code
# some import here
Base = declarative_base()
class Eca_users(Base):
__tablename__ = 'eca_users'
sql_id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key = True)
first_id = sqlalchemy.Column(sqlalchemy.String(15))
name = sqlalchemy.Column(sqlalchemy.String(200))
main_email = sqlalchemy.Column(sqlalchemy.String(200))
user_emails = relationship("Eca_user_emails", backref=backref('eca_users'))
class Eca_user_emails(Base):
__tablename__ = 'user_emails'
sql_id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key = True)
email_address = Column(String(200), nullable=False)
status = Column(String(10), nullable=False)
active = Column(DateTime, nullable=True)
sql_user_id = Column(Integer, ForeignKey('eca_users.sql_id'))
def main()
engine = sqlalchemy.create_engine('mssql+pymssql://user:pass/ECAusers?charset=utf8')
Session = sessionmaker()
Session.configure(bind = engine)
session = Session()
#then I get my json, parse it and...
query = session.query(Eca_users).filter(Eca_users.first_id == str(user_id))
if query.count() == 0:
# not interesting now
else:
for exstUser in query:
exstUser.name = name #update user info
exstUser.user_emails = [:] # empty old emails
# creating new Email obj
newEmail = Eca_user_emails(email_address = email_record['email'],
status = email_record['status'],
active = active_date)
exstUser.user_emails.append(newEmail) # and I get error here because autoflush
session.commit()
if __name__ == '__main__':
main()
Error message:
sqlalchemy.exc.IntegrityError: ...
[SQL: 'UPDATE user_emails SET sql_user_id=%(sql_user_id)s WHERE user_emails.sql_id = %(user_emails_sql_id)s'] [parameters: {'sql_user_id': None, 'user_emails_sql_id': Decimal('1')}]
Can't find any idea why this sql_user_id is None :(
When I chek exstUser and newEmail objects in debugger - it looks like everething fine. I mean all the reference is OK. The session obj and it's dirty attribute looks also OK in the debugger (sql_user_id is set for Eca_user_emails obj).
And what is most strange for me - this code worked absolutely fine when it was without a main function, just all code after the classes declaration. But after I wrote main declaration and put all code here I started to get this error.
I am completely new to Python so maybe this is one of stupid mistakes...
Any ideas how to fix it and what is the reason? Thanks for reading this :)
By the way: Python 3.4, sqlalchemy 1.0, SQL Server 2012
sql_user_id is None because by default SQLAlchemy clears out the foreign key when you delete a child object across a relationship, that is, when you clear exstUser.user_emails SQLAlchemy sets sql_user_id to None for all those instances. If you want SQLAlchemy to issue DELETEs for Eca_user_emails instances when they are detached from Eca_users, you need to add delete-orphan cascade option to the user_emails relationship. If you want SQLAlchemy to issue DELETEs for Eca_user_emails instances when a Eca_users instance is deleted, you need to add the delete cascade option to the user_emails relationship.
user_emails = relationship("Eca_user_emails", backref=backref('eca_users'), cascade="save-update, merge, delete, delete-orphan")
You can find more information about cascades in the SQLAlchemy docs

Categories

Resources