Forcing a sqlalchemy ORM get() outside identity map - python

Background
The get() method is special in SQLAlchemy's ORM because it tries to return objects from the identity map before issuing a SQL query to the database (see the documentation).
This is great for performance, but can cause problems for distributed applications because an object may have been modified by another process, so the local process has no ability to know that the object is dirty and will keep retrieving the stale object from the identity map when get() is called.
Question
How can I force get() to ignore the identity map and issue a call to the DB every time?
Example
I have a Company object defined in the ORM.
I have a price_updater() process which updates the stock_price attribute of all the Company objects every second.
I have a buy_and_sell_stock() process which buys and sells stocks occasionally.
Now, inside this process, I may have loaded a microsoft = Company.query.get(123) object.
A few minutes later, I may issue another call for Company.query.get(123). The stock price has changed since then, but my buy_and_sell_stock() process is unaware of the change because it happened in another process.
Thus, the get(123) call returns the stale version of the Company from the session's identity map, which is a problem.
I've done a search on SO(under the [sqlalchemy] tag) and read the SQLAlchemy docs to try to figure out how to do this, but haven't found a way.

Using session.expire(my_instance) will cause the data to be re-selected on access. However, even if you use expire (or expunge), the next data that is fetched will be based on the transaction isolation level. See the PostgreSQL docs on isolations levels (it applies to other databases as well) and the SQLAlchemy docs on setting isolation levels.
You can test if an instance is in the session with in: my_instance in session.
You can use filter instead of get to bypass the cache, but it still has the same isolation level restriction.
Company.query.filter_by(id=123).one()

Related

Will SQLAlchemy update the content of objects in the middle of a session?

In the SQLAlchemy docs it says this:
"When using a Session, it’s important to note that the objects which are associated with it are proxy objects to the transaction being held by the Session - there are a variety of events that will cause objects to re-access the database in order to keep synchronized. It is possible to “detach” objects from a Session, and to continue using them, though this practice has its caveats. It’s intended that usually, you’d re-associate detached objects with another Session when you want to work with them again, so that they can resume their normal task of representing database state."
[http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html]
If I am in the middle of a session in which I read some objects, do some manipulations and more queries and save some objects, before committing, is there a risk that changes to the dbase by other users will unexpectedly update my objects while I am working with them?
In other words, what are the "variety of events" referred to above?
Is the answer to set the transaction isolation level to maximum? (I am using postureSQL with Flask-SQLAlchemy and Flask-Restful, if any of that matters.)
No, SQLAlchemy does not monitor the database for changes or update your objects whenever it feels like it. I can imagine it would be quite expensive operation. The "variety of events" refers more to SQLAlchemy's internal state. I'm not familiar with all the "events" but for example when objects are marked as expired, SQLAlchemy automatically reloads them from the database. One such case is calling session.commit() and accessing any object's property again.
More here: Documentation about expiring objects

Would using transactions in a Celery task in Django application cause problems?

I have a set of celery tasks that I've written. Each of these tasks take a — just an example — author id as a parameter and for each of the books for the author, it fetches the latest price and stores it in the database.
I'd like to add transactions to my task by adding Django's
#transaction.commit_on_success decorator to my tasks. If any task crashes, I'd like the whole task to fail and nothing to be saved to the database.
I have a dozen or so celery workers that check the prices of books for a author and I'm wondering if this simple transactional logic would cause locking and race conditions in my Postgres database.
I've dug around and found this project called django-celery-transactions but I still haven't understood the real issue behind this and what this project tried to solve.
The reasoning is that in your Django view the DB transaction is not committed until the view has exited if you apply the decorator. Inside the view before it returns and triggers the commit you may invoke tasks that expect the DB transaction to already be committed i.e. for those entries to exist in the DB context.
In order to guard against this race condition (task starting before your view and consequently transaction finished) you can either manually manage it or use the module you mentioned which handles it automatically for you.
The example where it might fail for instance in your case is if you are adding a new author and you have a task that fetches prices for all/any of its books. Should the task execute before the commit for the new author transaction is done, your task will try to fetch Author with an id that does not yet exist.
It depends on several things including: the transaction isolation level of your database, how frequently you check for price updates, and how often you expect prices to change. If, for example, you were making a very large number of updates per second to stock standard PostgreSQL, you might get different results executing the same select statement multiple times in a transaction.
Databases are optimized to handle concurrency so I don't think this is going to be a problem for you; especially if you don't open the transaction until after fetching prices (i.e. use a context manager rather than decorating the task). If — for some reason — things get slow in the future, optimize then (fetch prices less frequently, tweak database configuration, etc.).
As for you other question: django-celery-transactions aims to prevent race conditions between Django and Celery. One example is if you were to pass the primary key of a newly created object to a task: the task may attempt to retrieve the object before the view's transaction has been committed. Boom!

SQLAlchemy/Pyramid DBSession refresh issue

Here's my scenario:
First view renders form, data goes to secend view, where i store it in DB (MySQL) and redirects to third view which shows what was written to db:
Stoing to db:
DBSession.add(object)
transaction.commit()
DB Session:
DBSession = scoped_session(sessionmaker(expire_on_commit=False,
autocommit=False,
extension=ZopeTransactionExtension()))
After that when I refresh my page several time sometimes I can see DB change, sometimes not, one time old data, second time new and so on...
When I restart server (locally, pserve) DB data is up-to-date.
Maybe it's a matter of creating session?
Check MySQL's transaction isolation level.
The default for InnoDB is REPEATABLE READ: "All consistent reads within the same transaction read the snapshot established by the first read."
You can specify the isolation level in the call to create_engine. See the SQLAlchemy docs.
I suggest you try the READ COMMITTED isolation level and see if that fixes your problem.
It's not clear exactly what your transaction object is or how it connects to the SQLAlchemy database session. I couldn't see anything about transactions in the Pyramid docs and I don't see anything in your code that links your transaction object to your SQLAlchemy session so maybe there is some configuration missing. What example are you basing this code on?
Also: the sessionmaker call is normally done at file score to create a single session factory, which is then used repeatedly to create session objects from the same source. "the sessionmaker() function is normally used to create a top level Session configuration which can then be used throughout an application without the need to repeat the configurational arguments."
It may be the case that since you are creating multiple session factories that there is some data that is supposed to be shared across sessions but is actually not shared because it is created once per factory. Try just calling sessionmaker once and see if that makes a difference.
I believe that your issue is likely to be a persistent session. By default, Pyramids expires all objects in the session after a commit -- this means that SQLA will fetch them from the database the next time you want them, and they will be fresh.
You have overridden this default by indicating "expire_on_commit=False" -- so, make sure that after committing a change you call session.expire_all() if you intend for that session object to grab fresh data on subsequent requests. (The session object is the same for multiple requests in Pyramid, but you aren't guaranteed to get the same thread-scoped session) I recommend not setting expire on commit to false, or using a non-global session: see http://docs.pylonsproject.org/projects/pyramid_cookbook/en/latest/database/sqlalchemy.html#using-a-non-global-session
Alternatively, you could make sure you are expiring objects when necessary, knowing that unexpired objects will stay in memory the way they are and will not be refreshed, and may differ from the same object in a different thread-scoped session.
The problem is that you're setting expire_on_commit=False. If you remove that, it should work. You can read more about what it does on http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#sqlalchemy.orm.session.Session.commit

How to avoid caching in sqlalchemy?

I have a problem with SQL Alchemy - my app works as a constantly working python application.
I have function like this:
def myFunction(self, param1):
s = select([statsModel.c.STA_ID, statsModel.c.STA_DATE)])\
.select_from(statsModel)
statsResult = self.connection.execute(s).fetchall()
return {'result': statsResult, 'calculation': param1}
I think this is clear example - one result set is fetched from database, second is just passed as argument.
The problem is that when I change data in my database, this function still returns data like nothing was changed. When I change data in input parameter, returned parameter "calculation" has proper value.
When I restart the app server, situation comes back to normal - new data are fetched from MySQL.
I know that there were several questions about SQLAlchemy caching like:
How to disable caching correctly in Sqlalchemy orm session?
How to disable SQLAlchemy caching?
but how other can I call this situation? It seems SQLAlchemy keeps the data fetched before and does not perform new queries until application restart. How can I avoid such behavior?
Calling session.expire_all() will evict all database-loaded data from the session. Any access of object attributes subsequent emits a new SELECT statement and gets new data back. Please see http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#refreshing-expiring for background.
If you still see so-called "caching" after calling expire_all(), then you need to close out transactions as described in my answer linked above.
A few possibilities.
You are reusing your session improperly or at improper time. Best practice is to throw away your session after you commit, and get a new one at the last possible moment before you use it. The behavior that appears to be caching may in fact be due to a session lifetime being very long in your application.
Objects that survive longer than the session are not being merged into a subsequent session. The "metadata" may not be able to update their state if you do not merge them back in. This is more a concern for the ORM API of SQLAlchemy, which you do not appear so far to be using.
Your changes are not committed. You say they are so we'll assume this is not it, but if none of the other avenues explain it you may want to look again.
One general debugging tip: if you want to know exactly what SQLAlchemy is doing in the database, pass echo=True to the create_engine function. The engine will print all queries it runs.
Also check out this suggestion I made to someone else, who was using ORM and had transactionality problems, which resolved their issue without ever pinpointing it. Maybe it will help you.
You need to change transaction isolation level to READ_COMMITTED
http://docs.sqlalchemy.org/en/rel_0_9/dialects/mysql.html#mysql-isolation-level

SQLAlchemy Event interface

I'm using SQLAlchemy 0.7. I would like some 'post-processing' to occur after a session.flush(), namely, I need to access the instances involved in the flush() and iterate through them. The flush() call will update the database, but the instances involved also store some data in an LDAP database, I would like SQLAlchemy to trigger an update to that LDAP database by calling an instance method.
I figured I'd be using the after_flush(session, flush_context) event, detailed here, but how do I get a list of update()'d instances?
On a side note, how can I determine which columns have changed (or are 'dirty') on an instance. I've been able to find out if an instance as a whole is dirty, but not individual properties.
According to the link you provided:
Note that the session’s state is still in pre-flush, i.e. ‘new’, ‘dirty’, and ‘deleted’ lists still show pre-flush state as well as the history settings on instance attributes.
This means that you should be able to get an access of all the dirty objects in the session.dirty list. You'll note that the first parameter of the event callback is the current session object.
As for the second part, you can use the sqlalchemy.orm.attributes.get_history function to figure out which columns have been changed. It returns a History object for a given attribute which contains a has_changes() method.
If you're trying to listen for changes on specific class attributes, consider using Attribute Events instead.

Categories

Resources