SQLAlchemy/Pyramid DBSession refresh issue - python

Here's my scenario:
First view renders form, data goes to secend view, where i store it in DB (MySQL) and redirects to third view which shows what was written to db:
Stoing to db:
DBSession.add(object)
transaction.commit()
DB Session:
DBSession = scoped_session(sessionmaker(expire_on_commit=False,
autocommit=False,
extension=ZopeTransactionExtension()))
After that when I refresh my page several time sometimes I can see DB change, sometimes not, one time old data, second time new and so on...
When I restart server (locally, pserve) DB data is up-to-date.
Maybe it's a matter of creating session?

Check MySQL's transaction isolation level.
The default for InnoDB is REPEATABLE READ: "All consistent reads within the same transaction read the snapshot established by the first read."
You can specify the isolation level in the call to create_engine. See the SQLAlchemy docs.
I suggest you try the READ COMMITTED isolation level and see if that fixes your problem.

It's not clear exactly what your transaction object is or how it connects to the SQLAlchemy database session. I couldn't see anything about transactions in the Pyramid docs and I don't see anything in your code that links your transaction object to your SQLAlchemy session so maybe there is some configuration missing. What example are you basing this code on?
Also: the sessionmaker call is normally done at file score to create a single session factory, which is then used repeatedly to create session objects from the same source. "the sessionmaker() function is normally used to create a top level Session configuration which can then be used throughout an application without the need to repeat the configurational arguments."
It may be the case that since you are creating multiple session factories that there is some data that is supposed to be shared across sessions but is actually not shared because it is created once per factory. Try just calling sessionmaker once and see if that makes a difference.

I believe that your issue is likely to be a persistent session. By default, Pyramids expires all objects in the session after a commit -- this means that SQLA will fetch them from the database the next time you want them, and they will be fresh.
You have overridden this default by indicating "expire_on_commit=False" -- so, make sure that after committing a change you call session.expire_all() if you intend for that session object to grab fresh data on subsequent requests. (The session object is the same for multiple requests in Pyramid, but you aren't guaranteed to get the same thread-scoped session) I recommend not setting expire on commit to false, or using a non-global session: see http://docs.pylonsproject.org/projects/pyramid_cookbook/en/latest/database/sqlalchemy.html#using-a-non-global-session
Alternatively, you could make sure you are expiring objects when necessary, knowing that unexpired objects will stay in memory the way they are and will not be refreshed, and may differ from the same object in a different thread-scoped session.

The problem is that you're setting expire_on_commit=False. If you remove that, it should work. You can read more about what it does on http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#sqlalchemy.orm.session.Session.commit

Related

How do I manually commit a sqlalchemy database transaction inside a pyramid web app?

I have a Pyramid web app that needs to run a Celery task after committing changes to a sqlalchemy database. I know I can do this using request.tm.get().addAfterCommitHook(). However, that doesn't work for me because I also need to use the task_id of the celery task inside the view. Therefore I need to commit changes to the database before I call task.delay() on my Celery task.
The zope.sqlalchemy documentation says that I can manually commit using transaction.commit(). However, this does not work for me; the celery task runs before the changes are committed to the database, even though I called transaction.commit() before I called task.delay()
My Pyramid view code looks like this:
ride=appstruct_to_ride(dbsession,appstruct)
dbsession.add(ride)
# Flush dbsession so ride gets an id assignment
dbsession.flush()
# Store ride id
ride_id=ride.id
log.info('Created ride {}'.format(ride_id))
# Commit ride to database
import transaction
transaction.commit()
# Queue a task to update ride's weather data
from ..processing.weather import update_ride_weather
update_weather_task=update_ride_weather.delay(ride_id)
url = self.request.route_url('rides')
return HTTPFound(
url,
content_type='application/json',
charset='',
text=json.dumps(
{'ride_id':ride_id,
'update_weather_task_id':update_weather_task.task_id}))
My celery task looks like this:
#celery.task(bind=True,ignore_result=False)
def update_ride_weather(self,ride_id, train_model=True):
from ..celery import session_factory
logger.debug('Received update weather task for ride {}'.format(ride_id))
dbsession=session_factory()
dbsession.expire_on_commit=False
with transaction.manager:
ride=dbsession.query(Ride).filter(Ride.id==ride_id).one()
The celery task fails with NoResultFound:
File "/app/cycling_data/processing/weather.py", line 478, in update_ride_weather
ride=dbsession.query(Ride).filter(Ride.id==ride_id).one()
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3282, in one
raise orm_exc.NoResultFound("No row was found for one()")
When I inspect the database after the fact, I see that the record was in fact created, after the celery task ran and failed. So this means that transaction.commit() did not commit the transaction as expected, but changes were instead committed automatically by the zope.sqlalchemy machinery after the view returned. How do I commit a transaction manually inside my view code?
request.tm is defined by pyramid_tm and could be the threadlocal transaction.manager object or a per-request object, depending on how you've configured pyramid_tm (look for pyramid_tm.manager_hook being defined somewhere to determine which one is being used.
Your question is tricky because whatever you do should fit into pyramid_tm and how it expects things to operate. Specifically it's planning to control a transaction around the lifecycle of the request - committing early is not a good idea with that transaction. pyramid_tm is trying to help to provide a failsafe ability to rollback the entire request if any failures occur anywhere in the request's lifecycle - not just in your view callable.
Option 1:
Commit early anyway. If you're gonna do this then failures after the commit cannot roll back the committed data, so you could have request's partially committed. Ok, fine, that's your question so the answer is to use request.tm.commit() probably followed by a request.tm.begin() to start a new one for any subsequent changes. You'll also need to be careful to not share sqlalchemy managed objects across that boundary, like request.user, etc as they need to be refreshed/merged into the new transaction (SQLAlchemy's identity cache cannot trust data loaded from a different transaction by default because that's just how isolation levels work).
Option 2:
Start a separate transaction just for the data you want to commit early. Ok, so assuming you're not using any threadlocals like transaction.manager, or scoped_session then you can probably start your own transaction and commit it, without touching the dbsession that is being controlled by pyramid_tm. Some generic code that works with the pyramid-cookiecutter-starter project structure could be:
from myapp.models import get_tm_session
tmp_tm = transaction.TransactionManager(explicit=True)
with tmp_tm:
dbsession_factory = request.registry['dbsession_factory']
tmp_dbsession = get_tm_session(dbsession_factory, tmp_tm)
# ... do stuff with tmp_dbsession that is committed in this with-statement
ride = appstruct_to_ride(tmp_dbsession, appstruct)
# do not use this ride object outside of the with-statement
tmp_dbsession.add(ride)
tmp_dbsession.flush()
ride_id = ride.id
# we are now committed so go ahead and start your background worker
update_weather_task = update_ride_weather.delay(ride_id)
# maybe you want the ride object outside of the tmp_dbsession
ride = dbsession.query(Ride).filter(Ride.id==ride_id).one()
return {...}
This isn't bad - probably about the best you can do as far as failure-modes go without hooking celery into the pyramid_tm-controlled dbsession.

Forcing a sqlalchemy ORM get() outside identity map

Background
The get() method is special in SQLAlchemy's ORM because it tries to return objects from the identity map before issuing a SQL query to the database (see the documentation).
This is great for performance, but can cause problems for distributed applications because an object may have been modified by another process, so the local process has no ability to know that the object is dirty and will keep retrieving the stale object from the identity map when get() is called.
Question
How can I force get() to ignore the identity map and issue a call to the DB every time?
Example
I have a Company object defined in the ORM.
I have a price_updater() process which updates the stock_price attribute of all the Company objects every second.
I have a buy_and_sell_stock() process which buys and sells stocks occasionally.
Now, inside this process, I may have loaded a microsoft = Company.query.get(123) object.
A few minutes later, I may issue another call for Company.query.get(123). The stock price has changed since then, but my buy_and_sell_stock() process is unaware of the change because it happened in another process.
Thus, the get(123) call returns the stale version of the Company from the session's identity map, which is a problem.
I've done a search on SO(under the [sqlalchemy] tag) and read the SQLAlchemy docs to try to figure out how to do this, but haven't found a way.
Using session.expire(my_instance) will cause the data to be re-selected on access. However, even if you use expire (or expunge), the next data that is fetched will be based on the transaction isolation level. See the PostgreSQL docs on isolations levels (it applies to other databases as well) and the SQLAlchemy docs on setting isolation levels.
You can test if an instance is in the session with in: my_instance in session.
You can use filter instead of get to bypass the cache, but it still has the same isolation level restriction.
Company.query.filter_by(id=123).one()

Will SQLAlchemy update the content of objects in the middle of a session?

In the SQLAlchemy docs it says this:
"When using a Session, it’s important to note that the objects which are associated with it are proxy objects to the transaction being held by the Session - there are a variety of events that will cause objects to re-access the database in order to keep synchronized. It is possible to “detach” objects from a Session, and to continue using them, though this practice has its caveats. It’s intended that usually, you’d re-associate detached objects with another Session when you want to work with them again, so that they can resume their normal task of representing database state."
[http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html]
If I am in the middle of a session in which I read some objects, do some manipulations and more queries and save some objects, before committing, is there a risk that changes to the dbase by other users will unexpectedly update my objects while I am working with them?
In other words, what are the "variety of events" referred to above?
Is the answer to set the transaction isolation level to maximum? (I am using postureSQL with Flask-SQLAlchemy and Flask-Restful, if any of that matters.)
No, SQLAlchemy does not monitor the database for changes or update your objects whenever it feels like it. I can imagine it would be quite expensive operation. The "variety of events" refers more to SQLAlchemy's internal state. I'm not familiar with all the "events" but for example when objects are marked as expired, SQLAlchemy automatically reloads them from the database. One such case is calling session.commit() and accessing any object's property again.
More here: Documentation about expiring objects

How to avoid caching in sqlalchemy?

I have a problem with SQL Alchemy - my app works as a constantly working python application.
I have function like this:
def myFunction(self, param1):
s = select([statsModel.c.STA_ID, statsModel.c.STA_DATE)])\
.select_from(statsModel)
statsResult = self.connection.execute(s).fetchall()
return {'result': statsResult, 'calculation': param1}
I think this is clear example - one result set is fetched from database, second is just passed as argument.
The problem is that when I change data in my database, this function still returns data like nothing was changed. When I change data in input parameter, returned parameter "calculation" has proper value.
When I restart the app server, situation comes back to normal - new data are fetched from MySQL.
I know that there were several questions about SQLAlchemy caching like:
How to disable caching correctly in Sqlalchemy orm session?
How to disable SQLAlchemy caching?
but how other can I call this situation? It seems SQLAlchemy keeps the data fetched before and does not perform new queries until application restart. How can I avoid such behavior?
Calling session.expire_all() will evict all database-loaded data from the session. Any access of object attributes subsequent emits a new SELECT statement and gets new data back. Please see http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#refreshing-expiring for background.
If you still see so-called "caching" after calling expire_all(), then you need to close out transactions as described in my answer linked above.
A few possibilities.
You are reusing your session improperly or at improper time. Best practice is to throw away your session after you commit, and get a new one at the last possible moment before you use it. The behavior that appears to be caching may in fact be due to a session lifetime being very long in your application.
Objects that survive longer than the session are not being merged into a subsequent session. The "metadata" may not be able to update their state if you do not merge them back in. This is more a concern for the ORM API of SQLAlchemy, which you do not appear so far to be using.
Your changes are not committed. You say they are so we'll assume this is not it, but if none of the other avenues explain it you may want to look again.
One general debugging tip: if you want to know exactly what SQLAlchemy is doing in the database, pass echo=True to the create_engine function. The engine will print all queries it runs.
Also check out this suggestion I made to someone else, who was using ORM and had transactionality problems, which resolved their issue without ever pinpointing it. Maybe it will help you.
You need to change transaction isolation level to READ_COMMITTED
http://docs.sqlalchemy.org/en/rel_0_9/dialects/mysql.html#mysql-isolation-level

How to disable SQLAlchemy caching?

I have a caching problem when I use sqlalchemy.
I use sqlalchemy to insert data into a MySQL database. Then, I have another application process this data, and update it directly.
But sqlalchemy always returns the old data rather than the updated data. I think sqlalchemy cached my request ... so ... how should I disable it?
The usual cause for people thinking there's a "cache" at play, besides the usual SQLAlchemy identity map which is local to a transaction, is that they are observing the effects of transaction isolation. SQLAlchemy's session works by default in a transactional mode, meaning it waits until session.commit() is called in order to persist data to the database. During this time, other transactions in progress elsewhere will not see this data.
However, due to the isolated nature of transactions, there's an extra twist. Those other transactions in progress will not only not see your transaction's data until it is committed, they also can't see it in some cases until they are committed or rolled back also (which is the same effect your close() is having here). A transaction with an average degree of isolation will hold onto the state that it has loaded thus far, and keep giving you that same state local to the transaction even though the real data has changed - this is called repeatable reads in transaction isolation parlance.
http://en.wikipedia.org/wiki/Isolation_%28database_systems%29
This issue has been really frustrating for me, but I have finally figured it out.
I have a Flask/SQLAlchemy Application running alongside an older PHP site. The PHP site would write to the database and SQLAlchemy would not be aware of any changes.
I tried the sessionmaker setting autoflush=True unsuccessfully
I tried db_session.flush(), db_session.expire_all(), and db_session.commit() before querying and NONE worked. Still showed stale data.
Finally I came across this section of the SQLAlchemy docs: http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#transaction-isolation-level
Setting the isolation_level worked great. Now my Flask app is "talking" to the PHP app. Here's the code:
engine = create_engine(
"postgresql+pg8000://scott:tiger#localhost/test",
isolation_level="READ UNCOMMITTED"
)
When the SQLAlchemy engine is started with the "READ UNCOMMITED" isolation_level it will perform "dirty reads" which means it will read uncommited changes directly from the database.
Hope this helps
Here is a possible solution courtesy of AaronD in the comments
from flask.ext.sqlalchemy import SQLAlchemy
class UnlockedAlchemy(SQLAlchemy):
def apply_driver_hacks(self, app, info, options):
if "isolation_level" not in options:
options["isolation_level"] = "READ COMMITTED"
return super(UnlockedAlchemy, self).apply_driver_hacks(app, info, options)
Additionally to zzzeek excellent answer,
I had a similar issue. I solved the problem by using short living sessions.
with closing(new_session()) as sess:
# do your stuff
I used a fresh session per task, task group or request (in case of web app). That solved the "caching" problem for me.
This material was very useful for me:
When do I construct a Session, when do I commit it, and when do I close it
This was happening in my Flask application, and my solution was to expire all objects in the session after every request.
from flask.signals import request_finished
def expire_session(sender, response, **extra):
app.db.session.expire_all()
request_finished.connect(expire_session, flask_app)
Worked like a charm.
I have tried session.commit(), session.flush() none worked for me.
After going through sqlalchemy source code, I found the solution to disable caching.
Setting query_cache_size=0 in create_engine worked.
create_engine(connection_string, convert_unicode=True, echo=True, query_cache_size=0)
First, there is no cache for SQLAlchemy.
Based on your method to fetch data from DB, you should do some test after database is updated by others, see whether you can get new data.
(1) use connection:
connection = engine.connect()
result = connection.execute("select username from users")
for row in result:
print "username:", row['username']
connection.close()
(2) use Engine ...
(3) use MegaData...
please folowing the step in : http://docs.sqlalchemy.org/en/latest/core/connections.html
Another possible reason is your MySQL DB is not updated permanently. Restart MySQL service and have a check.
As i know SQLAlchemy does not store caches, so you need to looking at logging output.

Categories

Resources