DatabaseSessionIsOver with Pony ORM due to lazy loading? - python

I am using Pony ORM for a flask solution and I've come across the following.
Consider the following:
#db_session
def get_orders_of_the_week(self, user, date):
q = select(o for o in Order for s in o.supplier if o.user == user)
q2 = q.filter(lambda o: o.date >= date and o.date <= date+timedelta(days=7))
res = q2[:]
#for r in res:
# print r.supplier.name
return res
When I need the result in Jinja2 -- which is looks like this
{% for order in res %}
Supplier: {{ order.supplier.name }}
{% endfor %}
I get a
DatabaseSessionIsOver: Cannot load attribute Supplier[3].name: the database session is over
If I uncomment the for r in res part, it works fine. I suspect there is some sort of lazy loading that doesn't get loaded with res = q2[:].
Am I completely missing a point or what's going on here?

I just added prefetch functionality that should solve your problem. You can take working code from the GitHub repository. This feature will be part of the upcoming release Pony ORM 0.5.4.
Now you can write:
q = q.prefetch(Supplier)
or
q = q.prefetch(Order.supplier)
and Pony will automatically load related supplier objects.
Below I'll show several queries with prefetching, using the standard Pony example with Students, Groups and Departments.
from pony.orm.examples.presentation import *
Loading Student objects only, without any prefetching:
students = select(s for s in Student)[:]
Loading students together with groups and departments:
students = select(s for s in Student).prefetch(Group, Department)[:]
for s in students: # no additional query to the DB is required
print s.name, s.group.major, s.group.dept.name
The same as above, but specifying attributes instead of entities:
students = select(s for s in Student).prefetch(Student.group, Group.dept)[:]
for s in students: # no additional query to the DB is required
print s.name, s.group.major, s.group.dept.name
Loading students and its courses (many-to-many relationship):
students = select(s for s in Student).prefetch(Student.courses)
for s in students:
print s.name
for c in s.courses: # no additional query to the DB is required
print c.name
As a parameters of the prefetch() method you can specify entities and/or attributes. If you specified an entity, then all to-one attributes with this type will be prefetched. If you specified an attribute, then this specific attribute will be prefetched. The to-many attributes are prefetched only when specified explicitly (as in the Student.courses example). The prefetching goes recursively, so you can load long chain of attributes, such as student.group.dept.
When object is prefetched, then by default all of its attributes are loaded, except lazy attributes and to-many attributes. You can prefetch lazy and to-many attributes explicitly if it is needed.
I hope this new method fully covers your use-case. If something is not working as expected, please start new issue on GitHub. Also you can discuss functionality and make feature requests at Pony ORM mailing list.
P.S. I'm not sure that repository pattern that you use give your serious benefits. I think that it actually increase coupling between template rendering and repo implementation, because you may need to change repo implementation (i.e. add new entities to prefetching list) when template code start using of new attributes. With the top-level #db_session decorator you can just send query result to the template and all happens automatically, without the need of explicit prefetching. But maybe I'm missing something, so I will be interested to see additional comments about the benefits of using the repository pattern in your case.

This happens because you're trying to access the related object which was not loaded and since you're trying to access it outside of the database session (the function decorated with the db_session), Pony raises this exception.
The recommended approach is to use the db_session decorator at the top level, at the same place where you put the Flask's app.route decorator:
#app.route('/index')
#db_session
def index():
....
return render_template(...)
This way all calls to the database will be wrapped with the database session, which will be finished after a web page is generated.
If there is a reason that you want to narrow the database session to a single function, then you need to iterate the returning objects inside the function decorated with the db_session and access all the necessary related objects. Pony will use the most effective way for loading the related objects from the database, avoiding the N+1 Query problem. This way Pony will extract all the necessary objects within the db_session scope, while the connection to the database is still active.
--- update:
Right now, for loading the related objects, you should iterate over the query result and call the related object attribute:
for r in res:
r.supplier.name
It is similar to the code in your example, I just removed the print statement. When you 'touch' the r.supplier.name attribute, Pony loads all non-lazy attributes of the related supplier object. If you need to load lazy attributes, you need to touch each of them separately.
Seems that we need to introduce a way to specify what related objects should be loaded during the query execution. We will add this feature in one of the future releases.

Related

Understanding session with fastApi dependency

I am new to Python and was studying FastApi and SQL model.
Reference link: https://sqlmodel.tiangolo.com/tutorial/fastapi/session-with-dependency/#the-with-block
Here, they have something like this
def create_hero(*, session: Session = Depends(get_session), hero: HeroCreate):
db_hero = Hero.from_orm(hero)
session.add(db_hero)
session.commit()
session.refresh(db_hero)
return db_hero
Here I am unable to understand this part
session.add(db_hero)
session.commit()
session.refresh(db_hero)
What is it doing and how is it working?
Couldn't understand this
In fact, you could think that all that block of code inside of the create_hero() function is still inside a with block for the session, because this is more or less what's happening behind the scenes.
But now, the with block is not explicitly in the function, but in the dependency above:
It's an explanation from docs what is a session
In the most general sense, the Session establishes all conversations
with the database and represents a “holding zone” for all the objects
which you’ve loaded or associated with it during its lifespan. It
provides the interface where SELECT and other queries are made that
will return and modify ORM-mapped objects. The ORM objects themselves
are maintained inside the Session, inside a structure called the
identity map - a data structure that maintains unique copies of each
object, where “unique” means “only one object with a particular
primary key”.
So
# This line just simply create a python object
# that sqlalchemy would "understand".
db_hero = Hero.from_orm(hero)
# This line add the object `db_hero` to a “holding zone”
session.add(db_hero)
# This line take all objects from a “holding zone” and put them in a database
# In our case we have only one object in this zone,
# but it is possible to have several
session.commit()
# This line gets created row from the database and put it to the object.
# It means it could have new attributes. For example id,
# that database would set for this new row
session.refresh(db_hero)

How to paginate in Flask-SQLAlchemy for db.session joined queries?

Say, we have the following relationships:
a person can have many email addresses
a email service provider can (obviously) serve multiple email address
So, it's a many to many relationship. I have three tables: emails, providers, and users. Emails have two foreign ids for provider and user.
Now, given a specific person, I want to print all the email providers and the email address it hosts for this person, if it exists. (If the person do not have an email at Gmail, I still want Gmail be in the result. I believe otherwise I only need a left inner join to solve this.)
I figured out how to do this with the following subqueries (following the sqlalchemy tutorial):
email_subq = db.session.query(Emails).\
filter(Emails.user_id==current_user.id).\
subquery()
provider_and_email = db.session.query(Provider, email_subq).\
outerjoin(email_subq, Provider.emails).\
all()
This works okay (it returns a 4-tuple of (Provider, user_id, provider_id, email_address), all the information that I want), but I later found out this is not using the Flask BaseQuery class, so that pagination provided by Flask-SQLAlchemy does not work. Apparently db.session.query() is not the Flask-SQLAlchemy Query instance.
I tried to do Emails.query.outerjoin[...] but that returns only columns in the email table though I want both the provider info and the emails.
My question: how can I do the same thing with Flask-SQLAlchemy so that I do not have to re-implement pagination that is already there?
I guess the simplest option at this point is to implement my own paginate function, but I'd love to know if there is another proper way of doing this.
I'm not sure if this is going to end up being the long-term solution, and it does not directly address my concern about not using the Flask-SQLAlchemy's BaseQuery, but the most trivial way around to accomplish what I want is to reimplement the paginate function.
And, in fact, it is pretty easy to use the original Flask-SQLAlchemy routine to do this:
def paginate(query, page, per_page=20, error_out=True):
if error_out and page < 1:
abort(404)
items = query.limit(per_page).offset((page - 1) * per_page).all()
if not items and page != 1 and error_out:
abort(404)
# No need to count if we're on the first page and there are fewer
# items than we expected.
if page == 1 and len(items) < per_page:
total = len(items)
else:
total = query.order_by(None).count()
return Pagination(query, page, per_page, total, items)
Modified from the paginate function found around line 376: https://github.com/mitsuhiko/flask-sqlalchemy/blob/master/flask_sqlalchemy.py
Your question is how to use Flask-SQLAlchemy's Pagination with regular SQLAlchemy queries.
Since Flask-SQLAlchemy's BaseQuery object holds no state of its own, and is derived from SQLAlchemy's Query, and is really just a container for methods, you can use this hack:
from flask.ext.sqlalchemy import BaseQuery
def paginate(sa_query, page, per_page=20, error_out=True):
sa_query.__class__ = BaseQuery
# We can now use BaseQuery methods like .paginate on our SA query
return sa_query.paginate(page, per_page, error_out)
To use:
#route(...)
def provider_and_email_view(page):
provider_and_email = db.session.query(...) # any SQLAlchemy query
paginated_results = paginate(provider_and_email, page)
return render_template('...', paginated_results=paginated_results)
*Edit:
Please be careful doing this. It's really just a way to avoid copying/pasting the paginate function, as seen in the other answer. Note that BaseQuery has no __init__ method. See How dangerous is setting self.__class__ to something else?.
*Edit2:
If BaseQuery had an __init__, you could construct one using the SA query object, rather than hacking .__class__.
Hey I have found a quick fix for this here it is:
provider_and_email = Provider.query.with_entities(email_subq).\
outerjoin(email_subq, Provider.emails).paginate(page, POST_PER_PAGE_LONG, False)
I'm currently using this approach:
query = BaseQuery([Provider, email_subq], db.session())
to create my own BaseQuery. db is the SqlAlchemy instance.
Update: as #afilbert suggests you can also do this:
query = BaseQuery(provider_and_email.subquery(), db.session())
How do you init your application with SQLAlchemy?
Probably your current SQLAlchemy connection has nothing to do with flask.ext.sqalchemy and you use original sqlalchemy
Check this tutorial and check your imports, that they really come from flask.ext.sqlalchemy
http://pythonhosted.org/Flask-SQLAlchemy/quickstart.html#a-minimal-application
You can try to paginate the list with results.
my_list = [my_list[i:i + per_page] for i in range(0, len(my_list), per_page)][page]
I did this and it works:
query = db.session.query(Table1, Table2, ...).filter(...)
if page_size is not None:
query = query.limit(page_size)
if page is not None:
query = query.offset(page*page_size)
query = query.all()
I could be wrong, but I think your problem may be the .all(). By using that, you're getting a list, not a query object.
Try leaving it off, and pass your query to the pagination method like so (I left off all the subquery details for clarity's sake):
email_query = db.session.query(Emails).filter(**filters)
email_query.paginate(page, per_page)

Python, SQLAlchemy and Postgresql: understanding inheritance

Pretty recent (but not newborn) to both Python, SQLAlchemy and Postgresql, and trying to understand inheritance very hard.
As I am taking over another programmer's code, I need to understand what is necessary, and where, for the inheritance concept to work.
My questions are:
Is it possible to rely only on SQLAlchemy for inheritance? In other words, can SQLAlchemy apply inheritance on Postgresql database tables that were created without specifying INHERITS=?
Is the declarative_base technology (SQLAlchemy) necessary to use inheritance the proper way. If so, we'll have to rewrite everything, so please don't discourage me.
Assuming we can use Table instance, empty Entity classes and mapper(), could you give me a (very simple) example of how to go through the process properly (or a link to an easily understandable tutorial - I did not find any easy enough yet).
The real world we are working on is real estate objects. So we basically have
- one table immobject(id, createtime)
- one table objectattribute(id, immoobject_id, oatype)
- several attribute tables: oa_attributename(oa_id, attributevalue)
Thanks for your help in advance.
Vincent
Welcome to Stack Overflow: in the future, if you have more than one question; you should provide a separate post for each. Feel free to link them together if it might help provide context.
Table inheritance in postgres is a very different thing and solves a different set of problems from class inheritance in python, and sqlalchemy makes no attempt to combine them.
When you use table inheritance in postgres, you're doing some trickery at the schema level so that more elaborate constraints can be enforced than might be easy to express in other ways; Once you have designed your schema; applications aren't normally aware of the inheritance; If they insert a row; it just magically appears in the parent table (much like a view). This is useful, for instance, for making some kinds of bulk operations more efficient (you can just drop the table for the month of january).
This is a fundamentally different idea from inheritance as seen in OOP (in python or otherwise, with relational persistence or otherwise). In that case, the application is aware that two types are related, and that the subtype is a permissible substitute for the supertype. "A holding is an address, a contact has an address therefore a contact can have a holding."
Which of these, (mostly orthogonal) tools you need depends on the application. You might need neither, you might need both.
Sqlalchemy's mechanisms for working with object inheritance is flexible and robust, you should use it in favor of a home built solution if it is compatible with your particular needs (this should be true for almost all applications).
The declarative extension is a convenience; It allows you to describe the mapped table, the python class and the mapping between the two in one 'thing' instead of three. It makes your code more "DRY"; It is however only a convenience layered on top of "classic sqlalchemy" and it isn't necessary by any measure.
If you find that you need table inheritance that's visible from sqlalchemy; your mapped classes won't be any different from not using those features; tables with inheritance are still normal relations (like tables or views) and can be mapped without knowledge of the inheritance in the python code.
For your #3, you don't necessarily have to declare empty entity classes to use mapper. If your application doesn't need fancy properties, you can just use introspection and metaclasses to model the existing tables without defining them. Here's what I did:
mymetadata = sqlalchemy.MetaData()
myengine = sqlalchemy.create_engine(...)
def named_table(tablename):
u"return a sqlalchemy.Table object given a SQL table name"
return sqlalchemy.Table(tablename, mymetadata, autoload=True, autoload_with=myengine)
def new_bound_class(engine, table):
u"returns a new ORM class (processed by sqlalchemy.orm.mapper) given a sqlalchemy.Table object"
fieldnames = table.c.__dict__['_data']
def format_attributes(obj, transform):
attributes = [u'%s=%s' % (x, transform(x)) for x in fieldnames]
return u', '.join(attributes)
class DynamicORMClass(object):
def __init__(self, **kw):
u"Keyword arguments may be used to initialize fields/columns"
for key in kw:
if key in fieldnames: setattr(self, key, kw[key])
else: raise KeyError, '%s is not a valid field/column' % (key,)
def __repr__(self):
return u'%s(%s)' % (self.__class__.__name__, format_attributes(self, repr))
def __str__(self):
return u'%s(%s)' % (str(self.__class__), format_attributes(self, str))
DynamicORMClass.__doc__ = u"This is a dynamic class created using SQLAlchemy based on table %s" % (table,)
return sqlalchemy.orm.mapper(DynamicORMClass, table)
def named_orm_class(table):
u"returns a new ORM class (processed by sqlalchemy.orm.mapper) given a table name or object"
if not isinstance(table, Table):
table = named_table(table)
return new_bound_class(table)
Example of use:
>>> myclass = named_orm_class('mytable')
>>> session = Session()
>>> obj = myclass(name='Fred', age=25, ...)
>>> session.add(obj)
>>> session.commit()
>>> print str(obj) # will print all column=value pairs
I beefed up my versions of new_bound_class and named_orm_class a little more with decorators, etc. to provide extra capabilities, and you can too. Of course, under the covers, it is declaring an empty entity class. But you don't have to do it, except this one time.
This will tide you over until you decide that you're tired of doing all those joins yourself, and why can't I just have an object attribute that does a lazy select query against related classes whenever I use it. That's when you make the leap to declarative (or Elixir).

Django: how to do get_or_create() in a threadsafe way?

In my Django app very often I need to do something similar to get_or_create(). E.g.,
User submits a tag. Need to see if
that tag already is in the database.
If not, create a new record for it. If
it is, just update the existing
record.
But looking into the doc for get_or_create() it looks like it's not threadsafe. Thread A checks and finds Record X does not exist. Then Thread B checks and finds that Record X does not exist. Now both Thread A and Thread B will create a new Record X.
This must be a very common situation. How do I handle it in a threadsafe way?
Since 2013 or so, get_or_create is atomic, so it handles concurrency nicely:
This method is atomic assuming correct usage, correct database
configuration, and correct behavior of the underlying database.
However, if uniqueness is not enforced at the database level for the
kwargs used in a get_or_create call (see unique or unique_together),
this method is prone to a race-condition which can result in multiple
rows with the same parameters being inserted simultaneously.
If you are using MySQL, be sure to use the READ COMMITTED isolation
level rather than REPEATABLE READ (the default), otherwise you may see
cases where get_or_create will raise an IntegrityError but the object
won’t appear in a subsequent get() call.
From: https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create
Here's an example of how you could do it:
Define a model with either unique=True:
class MyModel(models.Model):
slug = models.SlugField(max_length=255, unique=True)
name = models.CharField(max_length=255)
MyModel.objects.get_or_create(slug=<user_slug_here>, defaults={"name": <user_name_here>})
... or by using unique_togheter:
class MyModel(models.Model):
prefix = models.CharField(max_length=3)
slug = models.SlugField(max_length=255)
name = models.CharField(max_length=255)
class Meta:
unique_together = ("prefix", "slug")
MyModel.objects.get_or_create(prefix=<user_prefix_here>, slug=<user_slug_here>, defaults={"name": <user_name_here>})
Note how the non-unique fields are in the defaults dict, NOT among the unique fields in get_or_create. This will ensure your creates are atomic.
Here's how it's implemented in Django: https://github.com/django/django/blob/fd60e6c8878986a102f0125d9cdf61c717605cf1/django/db/models/query.py#L466 - Try creating an object, catch an eventual IntegrityError, and return the copy in that case. In other words: handle atomicity in the database.
This must be a very common situation. How do I handle it in a threadsafe way?
Yes.
The "standard" solution in SQL is to simply attempt to create the record. If it works, that's good. Keep going.
If an attempt to create a record gets a "duplicate" exception from the RDBMS, then do a SELECT and keep going.
Django, however, has an ORM layer, with it's own cache. So the logic is inverted to make the common case work directly and quickly and the uncommon case (the duplicate) raise a rare exception.
try transaction.commit_on_success decorator for callable where you are trying get_or_create(**kwargs)
"Use the commit_on_success decorator to use a single transaction for all the work done in a function.If the function returns successfully, then Django will commit all work done within the function at that point. If the function raises an exception, though, Django will roll back the transaction."
apart from it, in concurrent calls to get_or_create, both the threads try to get the object with argument passed to it (except for "defaults" arg which is a dict used during create call in case get() fails to retrieve any object). in case of failure both the threads try to create the object resulting in multiple duplicate objects unless some unique/unique together is implemented at database level with field(s) used in get()'s call.
it is similar to this post
How do I deal with this race condition in django?
So many years have passed, but nobody has written about threading.Lock. If you don't have the opportunity to make migrations for unique together, for legacy reasons, you can use locks or threading.Semaphore objects. Here is the pseudocode:
from concurrent.futures import ThreadPoolExecutor
from threading import Lock
_lock = Lock()
def get_staff(data: dict):
_lock.acquire()
try:
staff, created = MyModel.objects.get_or_create(**data)
return staff
finally:
_lock.release()
with ThreadPoolExecutor(max_workers=50) as pool:
pool.map(get_staff, get_list_of_some_data())

Django ORM: Selecting related set

Say I have 2 models:
class Poll(models.Model):
category = models.CharField(u"Category", max_length = 64)
[...]
class Choice(models.Model):
poll = models.ForeignKey(Poll)
[...]
Given a Poll object, I can query its choices with:
poll.choice_set.all()
But, is there a utility function to query all choices from a set of Poll?
Actually, I'm looking for something like the following (which is not supported, and I don't seek how it could be):
polls = Poll.objects.filter(category = 'foo').select_related('choice_set')
for poll in polls:
print poll.choice_set.all() # this shouldn't perform a SQL query at each iteration
I made an (ugly) function to help me achieve that:
def qbind(objects, target_name, model, field_name):
objects = list(objects)
objects_dict = dict([(object.id, object) for object in objects])
for foreign in model.objects.filter(**{field_name + '__in': objects_dict.keys()}):
id = getattr(foreign, field_name + '_id')
if id in objects_dict:
object = objects_dict[id]
if hasattr(object, target_name):
getattr(object, target_name).append(foreign)
else:
setattr(object, target_name, [foreign])
return objects
which is used as follow:
polls = Poll.objects.filter(category = 'foo')
polls = qbind(polls, 'choices', Choice, 'poll')
# Now, each object in polls have a 'choices' member with the list of choices.
# This was achieved with 2 SQL queries only.
Is there something easier already provided by Django? Or at least, a snippet doing the same thing in a better way.
How do you handle this problem usually?
Time has passed and this functionality is now available in Django 1.4 with the introduction of the prefetch_related() QuerySet function. This function effectively does what is performed by the suggested qbind function. ie. Two queries are performed and the join occurs in Python land, but now this is handled by the ORM.
The original query request would now become:
polls = Poll.objects.filter(category = 'foo').prefetch_related('choice_set')
As is shown in the following code sample, the polls QuerySet can be used to obtain all Choice objects per Poll without requiring any further database hits:
for poll in polls:
for choice in poll.choice_set:
print choice
Update: Since Django 1.4, this feature is built in: see prefetch_related.
First answer: don't waste time writing something like qbind until you've already written a working application, profiled it, and demonstrated that N queries is actually a performance problem for your database and load scenarios.
But maybe you've done that. So second answer: qbind() does what you'll need to do, but it would be more idiomatic if packaged in a custom QuerySet subclass, with an accompanying Manager subclass that returns instances of the custom QuerySet. Ideally you could even make them generic and reusable for any reverse relation. Then you could do something like:
Poll.objects.filter(category='foo').fetch_reverse_relations('choices_set')
For an example of the Manager/QuerySet technique, see this snippet, which solves a similar problem but for the case of Generic Foreign Keys, not reverse relations. It wouldn't be too hard to combine the guts of your qbind() function with the structure shown there to make a really nice solution to your problem.
I think what you're saying is, "I want all Choices for a set of Polls." If so, try this:
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)
I think what you are trying to do is the term "eager loading" of child data - meaning you are loading the child list (choice_set) for each Poll, but all in the first query to the DB, so that you don't have to make a bunch of queries later on.
If this is correct, then what you are looking for is 'select_related' - see https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
I noticed you tried 'select_related' but it didn't work. Can you try doing the 'select_related' and then the filter. That might fix it.
UPDATE: This doesn't work, see comments below.

Categories

Resources