In a project using Flask-SQLAlchemy, i get some intermittent errors and i think it might be due to not explicitly using transactions.
I have these two model classes, one for locations and another for closures:
class Location(db.Model):
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String)
code = sa.Column(sa.String, unique=True)
class LocationPath(db.Model):
ancestor_id = sa.Column(sa.Integer, sa.ForeignKey('location.id'), nullable=False, primary_key=True)
descendant_id = sa.Column(sa.Integer, sa.ForeignKey('location.id'), nullable=False, primary_key=True)
depth = sa.Column(sa.Integer, default=0, nullable=False)
In a background process, i'm doing a lot of inserts, so i'm bypassing the ORM to use Core:
location_table = Location.__table__
location_path_table = LocationPath.__table__
statement = select([location_table.c.id]).where(code == code)
result = db.session.get_bind().execute(statement)
location_id = result.first()
if location_id is None:
statement = location_table.insert().values(**kwargs)
result = db.session.get_bind().execute(statement)
new_id = result.inserted_primary_key[0]
result.close()
else:
new_id = location_id
# save new_id as an ancestor_id or a descendant_id
path = LocationPath.query.filter_by(
ancestor_id=ancestor_id,
descendant_id=descendant_id
).first()
if path is None:
statement = location_path_table.insert().values(
ancestor_id=ancestor_id,
descendant_id=descendant_id,
depth=depth)
# the line below intermittently generates either of two errors:
# - the inserted primary key (ancestor/descendant) does not exist
# - a duplicate key error where the path already exists
result = db.session.get_bind().execute(statement)
this has resulted in quite a bit of head-scratching on my part, since i get the ancestor_id or descendant_id either from a select or an insert, and i also query the database to see if the path exists before attempting to insert it.
Edit: the code above runs in a loop.
Related
In the case of many-to-many relationships, an association table can be used in the form of Association Object pattern.
I have the following setup of two classes having a M2M relationship through UserCouncil association table.
class Users(Base):
name = Column(String, nullable=False)
email = Column(String, nullable=False, unique=True)
created_at = Column(DateTime, default=datetime.utcnow)
password = Column(String, nullable=False)
salt = Column(String, nullable=False)
councils = relationship('UserCouncil', back_populates='user')
class Councils(Base):
name = Column(String, nullable=False)
created_at = Column(DateTime, default=datetime.utcnow)
users = relationship('UserCouncil', back_populates='council')
class UserCouncil(Base):
user_id = Column(UUIDType, ForeignKey(Users.id, ondelete='CASCADE'), primary_key=True)
council_id = Column(UUIDType, ForeignKey(Councils.id, ondelete='CASCADE'), primary_key=True)
role = Column(Integer, nullable=False)
user = relationship('Users', back_populates='councils')
council = relationship('Councils', back_populates='users')
However, in this situation, suppose I want to search for a council with a specific name cname for a given user user1. I can do the following:
for council in user1.councils:
if council.name == cname:
dosomething(council)
Or, alternatively, this:
session.query(UserCouncil) \
.join(Councils) \
.filter((UserCouncil.user_id == user1.id) & (Councils.name == cname)) \
.first() \
.council
While the second one is more similar to raw SQL queries and performs better, the first one is simpler. Is there any other, more idiomatic way of expressing this query which is better performing while also utilizing the relationship linkages instead of explicitly writing traditional joins?
First, I think even the SQL query you bring as an example might need to go to fetch the UserCouncil.council relationship again to the DB if it is not loaded in the memory already.
I think that given you want to search directly for the Council instance given its .name and the User instance, this is exactly what you should ask for. Below is the query for that with 2 options on how to filter on user_id (you might be more familiar with the second option, so please use it):
q = (
select(Councils)
.filter(Councils.name == councils_name)
.filter(Councils.users.any(UserCouncil.user_id == user_id)) # v1: this does not require JOIN, but produces the same result as below
# .join(UserCouncil).filter(UserCouncil.user_id == user_id) # v2: join, very similar to original SQL
)
council = session.execute(q).scalars().first()
As to making it more simple and idiomatic, I can only suggest to wrap it in a method or property on the User instance:
class Users(...):
...
def get_council_by_name(self, councils_name):
q = (
select(Councils)
.filter(Councils.name == councils_name)
.join(UserCouncil).filter(with_parent(self, Users.councils))
)
return object_session(self).execute(q).scalars().first()
so that you can later call it user.get_council_by_name('xxx')
Edit-1: added SQL queries
v1 of the first q query above will generate following SQL:
SELECT councils.id,
councils.name
FROM councils
WHERE councils.name = :name_1
AND (EXISTS
(SELECT 1
FROM user_councils
WHERE councils.id = user_councils.council_id
AND user_councils.user_id = :user_id_1
)
)
while v2 option will generate:
SELECT councils.id,
councils.name
FROM councils
JOIN user_councils ON councils.id = user_councils.council_id
WHERE councils.name = :name_1
AND user_councils.user_id = :user_id_1
So I am currently trying to build a new app using flask-restful for the backend, and I am still learning since all of this is quite new to me. I already set up everything with several different mySQL tables detailed next and all the relationships between them, and everything seems to be working fine, except than the insert process of new data is quite slow.
Here is a (simplified) explanation of the current setup I have. Basically, I first of all have one Flights table.
class FlightModel(db.Model):
__tablename__ = "Flights"
flightid = db.Column(db.Integer, primary_key = True, nullable=False)
[Other properties]
reviewid = db.Column(db.Integer, db.ForeignKey('Reviews.reviewid'), index = True, nullable = False)
review = db.relationship("ReviewModel", back_populates="flight", lazy='joined')
This table then points to a Reviews table, in which I store the global review left by the user.
class ReviewModel(db.Model):
__tablename__ = "Reviews"
reviewid = db.Column(db.Integer, primary_key = True, nullable = False)
[Other properties]
depAirportReviewid = db.Column(db.Integer, db.ForeignKey('DepAirportReviews.reviewid'), index=True, nullable = False)
arrAirportReviewid = db.Column(db.Integer, db.ForeignKey('ArrAirportReviews.reviewid'), index=True, nullable = False)
airlineReviewid = db.Column(db.Integer, db.ForeignKey('AirlineReviews.reviewid'), index=True, nullable = False)
flight = db.relationship("FlightModel", uselist=False, back_populates="review", lazy='joined')
depAirportReview = db.relationship("DepAirportReviewModel", back_populates="review", lazy='joined')
arrAirportReview = db.relationship("ArrAirportReviewModel", back_populates="review", lazy='joined')
airlineReview = db.relationship("AirlineReviewModel", back_populates="review", lazy='joined')
Then, a more detailed review can be left regarding different aspects of the flights, stored in yet another table (for example, in the following DepAirportReviews table: there are three tables in total at this level).
class DepAirportReviewModel(db.Model):
__tablename__ = "DepAirportReviews"
reviewid = db.Column(db.Integer, primary_key = True, nullable = False)
[Other properties]
review = db.relationship("ReviewModel", uselist=False, back_populates="depAirportReview", lazy='joined')
The insert process is slow (it typically takes 1 second per flight to insert, which is a problem when I try to bulk insert a few hundreds of them).
I understand this is because of all these relationships and all the trips to the database it implies, in order to retrieve the different ids for the different tables. Is it correct? Is there anything I could do to solve this, or will I need to redesign the tables to remove some of these relationships?
Thanks for any help!
EDIT: displaying the SQL executed directly showed what I expected: 7 simple queries in total are executed for each insertion, each taking ~300ms. It's quite long, I guess it is mostly due to the time to reach the server. Nothing to be done except remvoing some foreign keys, right?
Here's my ORM entity class. The primary key is composite cause 'id_string' may be the same for different users (identified by uid). One thing I understood from Postgres SQL error when creating a table based on this class (
ProgrammingError: (ProgrammingError) there is no unique constraint matching given keys for referenced table "sync_entities"
) is that I need to add something to parent_id_string's ForeignKey() argument. And that something is, I think, the current record's uid.
Do you suggest to try using different primary key (autoincrementing integer) or there is some other way?
class SyncEntity(Base):
__tablename__ = 'sync_entities'
__table_args__ = (ForeignKeyConstraint(['uid'], ['users.uid'], ondelete='CASCADE'), {})
uid = Column(BigInteger, primary_key=True)
id_string = Column(String, primary_key=True)
parent_id_string = Column(String, ForeignKey('sync_entities.id_string'))
children = relation('SyncEntity',
primaryjoin=('sync_entities.c.id_string==sync_entities.c.parent_id_string'),
backref=backref('parent', \
remote_side=[id_string]))
# old_parent_id = ...
version = Column(BigInteger)
mtime = Column(BigInteger)
ctime = Column(BigInteger)
name = Column(String)
non_unique_name = Column(String)
sync_timestamp = Column(BigInteger)
server_defined_unique_tag = Column(String)
position_in_parent = Column(BigInteger)
insert_after_item_id = Column(String, ForeignKey('sync_entities.id_string'))
insert_after = relation('SyncEntity',
primaryjoin=('sync_entities.c.id_string==sync_entities.c.insert_after_item_id'),
remote_side=[id_string])
deleted = Column(Boolean)
originator_cache_guid = Column(String)
originator_client_item_id = Column(String)
specifics = Column(LargeBinary)
folder = Column(Boolean)
client_defined_unique_tag = Column(String)
ordinal_in_parent = Column(LargeBinary)
You know, primary key being an auto-incremented integer is usually the best approach. Any values that seem to be unique in system, may turn out to be duplicated in future. If you relied on their uniqueness you're in deep trouble.
However, if there is a reason to require certain pair (or triple) of values in each row to be unique, just add constraint to your table, but use auto-increment integer as primary key. Then if requirements change, you can alter/remove/relax your unique constraint without making changes elsewhere.
Also - if you're using simple integer keys, your joins are simpler and can be performed faster by DBMS.
I think I came up with a good idea. Just need to create complex foreign key constructs in the __tableargs__ member like (parent_id_string, uid) and (insert_after_item_id, uid), modifying the primaryjoin statements accordingly.
So I'm quite new to SQLAlchemy.
I have a model Showing which has about 10,000 rows in the table. Here is the class:
class Showing(Base):
__tablename__ = "showings"
id = Column(Integer, primary_key=True)
time = Column(DateTime)
link = Column(String)
film_id = Column(Integer, ForeignKey('films.id'))
cinema_id = Column(Integer, ForeignKey('cinemas.id'))
def __eq__(self, other):
if self.time == other.time and self.cinema == other.cinema and self.film == other.film:
return True
else:
return False
Could anyone give me some guidance on the fastest way to insert a new showing if it doesn't exist already. I think it is slightly more complicated because a showing is only unique if the time, cinmea, and film are unique on a showing.
I currently have this code:
def AddShowings(self, showing_times, cinema, film):
all_showings = self.session.query(Showing).options(joinedload(Showing.cinema), joinedload(Showing.film)).all()
for showing_time in showing_times:
tmp_showing = Showing(time=showing_time[0], film=film, cinema=cinema, link=showing_time[1])
if tmp_showing not in all_showings:
self.session.add(tmp_showing)
self.session.commit()
all_showings.append(tmp_showing)
which works, but seems to be very slow. Any help is much appreciated.
If any such object is unique based on a combination of columns, you need to mark these as a composite primary key. Add the primary_key=True keyword parameter to each of these columns, dropping your id column altogether:
class Showing(Base):
__tablename__ = "showings"
time = Column(DateTime, primary_key=True)
link = Column(String)
film_id = Column(Integer, ForeignKey('films.id'), primary_key=True)
cinema_id = Column(Integer, ForeignKey('cinemas.id'), primary_key=True)
That way your database can handle these rows more efficiently (no need for an incrementing column), and SQLAlchemy now automatically knows if two instances of Showing are the same thing.
I believe you can then just merge your new Showing back into the session:
def AddShowings(self, showing_times, cinema, film):
for showing_time in showing_times:
self.session.merge(
Showing(time=showing_time[0], link=showing_time[1],
film=film, cinema=cinema)
)
First, the database overview:
competitors - people who compete
competitions - things that people compete at
competition_registrations - Competitors registered for a particular competition
event - An "event" at a competition.
events_couples - A couple (2 competitors) competing in an event.
First, EventCouple, a Python class corresponding to events_couples, is:
class EventCouple(Base):
__tablename__ = 'events_couples'
competition_id = Column(Integer, ForeignKey('competitions.id'), primary_key=True)
event_id = Column(Integer, ForeignKey('events.id'), primary_key=True)
leader_id = Column(Integer)
follower_id = Column(Integer)
__table_args__ = (
ForeignKeyConstraint(['competition_id', 'leader_id'], ['competition_registrations.competition_id', 'competition_registrations.competitor_id']),
ForeignKeyConstraint(['competition_id', 'follower_id'], ['competition_registrations.competition_id', 'competition_registrations.competitor_id']),
{}
)
I have a Python class, CompetitorRegistration, that corresponds to a record/row in competition_registrations. A competitor, who is registered, can compete in multiple events, but either as a "leader", or a "follower". I'd like to add to CompetitorRegistration an attribute leading, that is a list of EventCouple where the competition_id and leader_id match. This is my CompetitorRegistration class, complete with attempt:
class CompetitorRegistration(Base):
__tablename__ = 'competition_registrations'
competition_id = Column(Integer, ForeignKey('competitions.id'), primary_key=True)
competitor_id = Column(Integer, ForeignKey('competitors.id'), primary_key=True)
email = Column(String(255))
affiliation_id = Column(Integer, ForeignKey('affiliation.id'))
is_student = Column(Boolean)
registered_time = Column(DateTime)
leader_number = Column(Integer)
leading = relationship('EventCouple', primaryjoin=and_('CompetitorRegistration.competition_id == EventCouple.competition_id', 'CompetitorRegistration.competitor_id == EventCouple.leader_id'))
following = relationship('EventCouple', primaryjoin='CompetitorRegistration.competition_id == EventCouple.competition_id and CompetitorRegistration.competitor_id == EventCouple.follower_id')
However, I get:
ArgumentError: Could not determine relationship direction for primaryjoin
condition 'CompetitorRegistration.competition_id == EventCouple.competition_id
AND CompetitorRegistration.competitor_id == EventCouple.leader_id', on
relationship CompetitorRegistration.leading. Ensure that the referencing Column
objects have a ForeignKey present, or are otherwise part of a
ForeignKeyConstraint on their parent Table, or specify the foreign_keys parameter
to this relationship.
Thanks for any help, & let me know if more info is needed on the schema.
Also, another attempt of mine is visible in following — this did not error, but didn't give correct results either. (It only joined on the competition_id, and completely ignored the follower_id)
Your leading's condition mixes expression and string to be eval()ed. And following's condition mixes Python and SQL operators: and in Python is not what you expected here. Below are corrected examples using both variants:
leading = relationship('EventCouple', primaryjoin=(
(competition_id==EventCouple.competition_id) & \
(competitor_id==EventCouple.leader_id)))
leading = relationship('EventCouple', primaryjoin=and_(
competition_id==EventCouple.competition_id,
competitor_id==EventCouple.leader_id))
following = relationship('EventCouple', primaryjoin=\
'(CompetitorRegistration.competition_id==EventCouple.competition_id) '\
'& (CompetitorRegistration.competitor_id==EventCouple.follower_id)')