SQLAlchemy low performance when use relationship - python

I have SQLAlchemy models in my Flask application:
class User(db.Model):
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
photos = db.relationship('Photo', lazy='joined')
class Photo(db.Model):
__tablename__ = 'photos'
id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('users.id'))
photo = db.Column(db.String(255))
When i query user i get user with his photos automatically. But i noticed if user has a lot of photos, query u = User.query.get(1) become very slow. I do the same but manually with lazy='noload' option
u = User.query.get(1)
photos = Photo.query.filter_by(user_id == 1)
and on the same data it works faster in times. Where is problem, is sql join slow(don`t think so, because it start hang on 100-1kk photo objects, not so big data) or something wrong in the SQLAlchemy?

From my experiences I suggest you to get familiar with SQLAlchemy Loading Relationships. Sometimes even if relationship functionality is easy, usefull in larger datasets is better to do not use it, or even execute plain text SQL. This will be better from performance point of view on larger data sets.

Related

Why use "db.*" in SqlAlchemy?

I am working on a project using Flask and SqlAlchemy. Me and my colleagues found two ways to define a table. Both work, but what is the different?
Possibility I
base = declarative_base()
class Story(base):
__tablename__ = 'stories'
user_id = Column(Integer, primary_key=True)
email = Column(String(100), unique=True)
password = Column(String(100), unique=True)
Possibility II
db = SQLAlchemy()
class Story(db.Model):
__tablename__ = 'stories'
user_id = db.Column(Integer, primary_key=True)
email = db.Column(String(100), unique=True)
password = db.Column(String(100), unique=True)
We want to choose one option, but which one?
It is obvious that both classes inherit from a different parent class, but for what are these two possibilities used for?
Possibility 1 is raw SQLAlchemy declarative mapping.
Possibility 2 is Flask-SQLAlchemy.
Both map a class to SQL table (or something more exotic in SQL) in a declarative style, i.e. the class is mapped to an automatically generated table.
Choosing which one to use however is a matter of opinion.
I'll say that using Flask-SQLAlchemy is obviously locking the application to Flask, but that's basically a non-problem since switching frameworks is very uncommon.
NB. __tablename__ is optional with Flask-SQLAlchemy.

SQLAlchemy one-to-many relationship - how to properly get the 'many' collection

I'm not sure I properly understand how to get the collection part of the one-to-many relationship.
class ProjectReport(db.Model):
__tablename__ = "project_reports"
id = db.Column(UUID, primary_key=True, default=uuid.uuid4)
project_id = db.Column(UUID, db.ForeignKey("projects.id"), nullable=False)
entries = db.relationship("ProducerEntry", backref="project_report", lazy="dynamic")
class ProducerEntry(Entry):
__tablename__ = "producer_entries"
__mapper_args__ = {"polymorphic_identity": "Entry"}
id = db.Column(UUID, db.ForeignKey("entries.id"), primary_key=True)
project_id = db.Column(UUID, db.ForeignKey("projects.id"), nullable=False)
project_report_id = db.Column(UUID, db.ForeignKey("project_reports.id"), nullable=True)
My problem is that I can't just access the entries field.
for entry in self.entries:
do_something(entry)
This returns NotImplementedError
I managed to get the data via hybrid property but that seems a bit of an overkill since already have the relationship, also it'd get a bit complex for further logic later on.
#hybrid_property
def entries(self):
return ProducerEntry.query.filter_by(project_report_id=self.id)
Ab additional information is that the ProjectReport is basically the common columns of the Entry and Project models, and the project_report_id is nullable, because the entries and projects are generated first and then I can generate the project reports from them. This is how I create the reports:
...
project_report = ProjectReport(date_order=entry.date_order, project_id=entry.project.id)
project_report.entries.append(entry)
...
As far as I know I don't have to add the project_report_id to the producer entry after this.
What am I missing here?
Well yeah, that relationship field returns a query, so I simply should have called:
self.entries.all()
Or anything else which is handling a query.

SQLalchemy built relationship between classes

I am running into a conceptual problem I do not know how to approach, which might be due my lack of knowledge with SQLalchemy. I have two classes: People and Person and I want them each to have a column to share their respective id's with each other using the relationship function.
Now, I have an endpoint in views.py which instantiates those two classes and establishes a Child / Parent relationship. Looking at the database results however, only People, the parent class has the id stored in its respective table, while the Person table in column people is None.
I know the id in person is only generated after the commit() statement and thus None for Person, and was wondering if there is a way to solve this elegantly, or do I need to first query the current people instance, retreive its id, set the id in the person table and then commit() again?
I hope my question makes sense,thank you.
'''
model.py
'''
class People(Model):
__tablename__ = 'people'
id = Column(Integer, primary_key=True)
person = relationship('Person', back_populates='people')
person_id = Column(Integer, ForeignKey('people.id'))
class Person(Model):
__tablename__ = 'people'
id = Column(Integer, primary_key=True)
people = relationship('People', uselist=False, back_populates='person')
'''
views.py
'''
#main.route('/', methods=['GET', 'POST'])
def index():
people = People()
person = Person(people_id = ?)
people.person = person
session.add(person)
session.add(people)
session.commit()
I regret that I have not yet understood your question. However, since your code contains some errors, I will first write you my corrected variant.
class People(Model):
__tablename__ = 'people'
id = Column(Integer, primary_key=True)
person = relationship('Person', back_populates='people')
person_id = Column(Integer, ForeignKey('person.id'))
class Person(Model):
__tablename__ = 'person'
id = Column(Integer, primary_key=True)
people = relationship('People', back_populates='person')
def index():
person = Person()
people = People()
people.person = person
session.add(person)
session.add(people)
session.commit()
The question of gittert seems justified to me. It makes no sense to save the ForeignKey in both tables on the referenced identifiers of the other model.
What do you want to achieve?
If you're looking for an actual column in your database for your 'relationships', you won't find them. Your .people and .person are virtual relationships created in Python without any interaction with the SQL database.

Query items from granchildren objects

I have a Flask python application that has a set of related tables that are chained together through foreign keys. I would like to be able to return an aggregate list of records from one table that are related to a distant table. However, I am struggling to understand how sqlalchemy does this through object relationships.
For example, there are three objects I'd like to join (challenge and badge) with two tables (talent_challenge and badge) to be able to query for all badges related to a specific challenge. In SQL, this would look something like:
SELECT b.id, b.name
FROM badge b
INNER JOIN talent_challenge tc ON tc.talent_id = b.talent_id
WHERE tc.challenge_id = 21
The 'talent' and 'challenge' tables are not needed in this case, since I only need the talent and challenge IDs (in 'talent_challenge') for the relationship. All of the interesting detail is in the badge table.
I am able to use sqlalchemy to access the related talent from a challenge using:
talents = db.relationship('TalentModel', secondary='talent_challenge')
And I can then reference talent.badges for each of those talents to get the relevant badges related to my initial challenge. However, there can be redundancy, and this list of badges isn't contained in a single object.
A stripped-down version of the three models are:
class TalentModel(db.Model):
__tablename__ = 'talent'
# Identity
id = db.Column(db.Integer, primary_key=True)
# Relationships
challenges = db.relationship('ChallengeModel', secondary='talent_challenge',)
# badges (derived as backref from BadgeModel)
class ChallengeModel(db.Model):
__tablename__ = 'challenge'
# Identity
id = db.Column(db.Integer, primary_key=True)
member_id = db.Column(db.Integer, db.ForeignKey('member.id'))
# Relationships
talents = db.relationship('TalentModel', secondary='talent_challenge', order_by='desc(TalentModel.created_at)')
class BadgeModel(db.Model):
__tablename__ = 'badge'
# Identity
id = db.Column(db.Integer, primary_key=True)
talent_id = db.Column(db.Integer, db.ForeignKey('talent.id'))
# Parents
talent = db.relationship('TalentModel', foreign_keys=[talent_id], backref="badges")
I also have a model for the associative table, 'talent_challenge':
class TalentChallengeModel(db.Model):
__tablename__ = 'talent_challenge'
# Identity
id = db.Column(db.Integer, primary_key=True)
talent_id = db.Column(db.Integer, db.ForeignKey('talent.id'))
challenge_id = db.Column(db.Integer, db.ForeignKey('challenge.id'))
# Parents
talent = db.relationship('TalentModel', uselist=False, foreign_keys=[talent_id])
challenge = db.relationship('ChallengeModel', uselist=False, foreign_keys=[challenge_id])
I would like to better understand sqlalchemy (or specifically, flask-sqlalchemy) to allow me to construct this list of badges from the challenge object. Is db.session.query of BadgeModel my only option?
UPDATED 1/23/2015:
My blocker on my project was solved by using the following:
#property
def badges(self):
from app.models.sift import BadgeModel
from app.models.relationships.talent import TalentChallengeModel
the_badges = BadgeModel.query\
.join(TalentChallengeModel, TalentChallengeModel.talent_id==BadgeModel.talent_id)\
.filter(TalentChallengeModel.challenge_id==self.id)\
.all()
return the_badges
Wrapping the query in a function got around the issues I was having with the name BadgeModel not being defined and not being able to be imported in the model otherwise. The #property decorator allows me to just reference this as challenge.badges later in the view.
However, I am still interested in understanding how to do this as a relationship. Some searching elsewhere led me to believe this would work:
badges = db.relationship('BadgeModel',
secondary="join(BadgeModel, TalentChallengeModel, BadgeModel.talent_id == TalentChallengeModel.talent_id)",
secondaryjoin="remote([id]) == foreign(TalentChallengeModel.challenge_id)",
primaryjoin="BadgeModel.talent_id == foreign(TalentChallengeModel.talent_id)",
viewonly=True,
)
Because of other unresolved issues in my application environment, I can't fully test this (e.g., adding this code breaks Flask-User in my site) but would like to know if this is correct syntax and if there is any disadvantage to this over the query-in-function solution.

What's the proper way to describe an associative object by SQLalchemy the declarative way

I'm looking for a way to describe an associative object the declarative way. Beyond storing the foreign keys in the association table, I need to store information like the creation date of the association.
Today, my model looks like that :
# Define the User class
class User(Base):
__tablename__ = 'users'
# Define User fields
id = schema.Column(types.Integer(unsigned=True),
schema.Sequence('users_seq_id', optional=True), primary_key=True)
password = schema.Column(types.Unicode(64), nullable=False)
# Define the UserSubset class
class UserSubset(Base):
__tablename__ = 'subsets'
# Define UserSubset fields
id = schema.Column(types.Integer(unsigned=True),
schema.Sequence('subsets_seq_id', optional=True), primary_key=True)
some_short_description = schema.Column(types.Unicode(50), nullable=False)
# Define the subset memberships table
subset_memberships = schema.Table('group_memberships', Base.metadata,
schema.Column('user_id', types.Integer(unsigned=True), ForeignKey('users.id')),
schema.Column('subset_id', types.Integer(unsigned=True), ForeignKey('subsets.id')),
schema.Column('created', types.DateTime(), default=now, nullable=False),
)
Can I connect everything in an associative object ? Or should I change stop using the declarative way ?
What you are using at the moment is just a Many-to-Many-relation. How to work with association objects is described in the docs.
There is also an extension called associationproxy which simplifies the relation.
As you can see in the manual, configuring a one to many relation is really simple:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(50))
addresses = relation("Address", backref="user")
class Address(Base):
__tablename__ = 'addresses'
id = Column(Integer, primary_key=True)
email = Column(String(50))
user_id = Column(Integer, ForeignKey('users.id'))
Many to many relations isn't much harder:
There’s nothing special about many-to-many with declarative. The secondary argument to relation() still requires a Table object, not a declarative class. The Table should share the same MetaData object used by the declarative base:
keywords = Table('keywords', Base.metadata,
Column('author_id', Integer, ForeignKey('authors.id')),
Column('keyword_id', Integer, ForeignKey('keywords.id'))
)
class Author(Base):
__tablename__ = 'authors'
id = Column(Integer, primary_key=True)
keywords = relation("Keyword", secondary=keywords)
You should generally not map a class and also specify its table in a many-to-many relation, since the ORM may issue duplicate INSERT and DELETE statements.
Anyway, what you seem to be doing might be better served with inheritance. Of course, there can be complex table relations that will be a pathological case for the declarative way, but this doesn't seem to be one of them.
One more thing, code comments should state what the following code does ans why, not how it does it. Having a # Define the User class comment is almost like having a line of code saying a = 1 # assing value 1 to variable "a".

Categories

Resources