sqlalchemy caching some queries - python

I have this running on a live website. When a user logs in I query his profile to see how many "credits" he has available. Credits are purchased via paypal. If a person buys credits and the payment comes through, the query still shows 0 credits even though if I run the same query in phpmyadmin it brings the right result. If I restart the apache webserver and reload the page the right number of credits are being shown. Here's my mapper code which shows the number of credits each user has:
mapper( User, users_table, order_by = 'user.date_added DESC, user.id DESC', properties = {
'userCreditsCount': column_property(
select(
[func.ifnull( func.sum( orders_table.c.quantity ), 0 )],
orders_table.c.user_id == users_table.c.id
).where( and_(
orders_table.c.date_added > get_order_expire_limit(), # order must not be older than a month
orders_table.c.status == STATUS_COMPLETED
) ).\
label( 'userCreditsCount' ),
deferred = True
)
# other properties....
} )
I'm using sqlalchemy with flask framework but not using their flask-sqlalchemy package (just pure sqlalchemy)
Here's how I initiate my database:
engine = create_engine( config.DATABASE_URI, pool_recycle = True )
metadata = MetaData()
db_session = scoped_session( sessionmaker( bind = engine, autoflush = True, autocommit = False ) )
I learned both python and sqlalchemy on this project so I may be missing things but this one is driving me nuts. Any ideas?

when you work with a Session, as soon as it starts working with a connection, it holds onto that connection until commit(), rollback() or close() is called. With the DBAPI, the connection to the database also remains in a transaction until the transaction is committed or rolled back.
In this case, when you've loaded data into your session, SQLAlchemy doesn't refresh the data until the transaction is ended (or if you explicitly expire some part of the data with expire()). This is the natural behavior to have, since due to transaction isolation, it's very likely that the current transaction cannot see changes that have occurred since that transaction started in any case.
So while using expire() or refresh() may or may not be part of how to get the latest data into your Session, really you need to end your transaction and start a new one to truly see what's been changed elsewhere since that transaction started. you should organize your application so that a particular Session() is ready to go when a new request comes in, but when that request completes, the Session() should be closed out, and a new one (or at least a new transaction) started up on the next request.

Please try to call refresh or expire on your object before accessing the field userCreditsCount:
user1 = session.query(User).get(1)
# ...
session.refresh(user1, ('userCreditsCount',))
This will make the query execute again (when refresh is called).
However, depending on the isolation mode your transaction uses, it might not resolve the problem, in which case you might need to commit/rollback the transaction (session) in order for the query to give you new result.

Lifespan of a Contextual Session
I'd make sure you're closing the session when you're done with it.
session = db_session()
try:
return session.query(User).get(5)
finally:
session.close()

set sessionmaker's autocommit to True and see if that helps, according to documentation sessionmaker caches
the identity map pattern, and stores objects keyed to their primary key. However, it doesn’t do any kind of query caching.
so in your code it would become:
sessionmaker(bind = engine, autoflush = True, autocommit = True)

Related

Query execution hanging in specific circumstances

The problem
For a while now i've encountered a bug where a data retrieval query keeps hanging during execution. If that was all, then debugging would be fine, but it is not easy to reproduce, namely:
It only occurs on my linux laptop (manjaro xfce), with no problems on my windows pc
Primarily occurs on a few specific timestamps (mostly 4:05)
Even then doesn't consistently appear
I know how this can be fixed (by prepending the query with a select 1;), but i don't understand why the problem occurs, and why my solution fixes it, which is where i'm stuck. I've not seen any other problems that specifically describe this issue.
Code
The query in question below. What is does is select a range of measurements, and then averaging those measurements per timestep (and interpolating in case it's necessary) to get a range of averages.
SELECT datetime, AVG(wc) as wc
FROM (
SELECT public.time_bucket_gapfill('5 minutes', m.datetime)
AS datetime, public.interpolate(AVG(m.wc)) as wc
FROM growficient.measurement AS m
INNER JOIN growficient.placement AS p ON m.placement_id = p.id
WHERE m.datetime >= '2022-09-30T22:00:00+00:00'
AND m.datetime < '2022-10-01T04:05:00+00:00'
AND p.section_id = 'bd5114b8-4aab-11eb-af66-32bd66d4e25c'
GROUP BY public.time_bucket_gapfill('5 minutes', m.datetime), p.id
) AS placement_averages
GROUP BY datetime
ORDER BY datetime;
This is then executed via SQLAlchemy on a session level. In case the bug appears, it never gets to the fetchall().
execute_result = session.execute(query)
readings = execute_result.fetchall()
We're using session management very similar to what's seen in the SQLAlchemy documentation. This is meant to be a debug-session however, meaning that no commit statements are included.
sessionMaker = sessionmaker(
autocommit=False,
autoflush=False,
bind=create_engine(
config.get_settings().main_db,
echo=False,
connect_args=connect_options,
pool_pre_ping=True,
),
)
#contextlib.contextmanager
def managed_session() -> Session:
session = sessionMaker()
try:
yield session
except Exception as e:
session.rollback()
logger.error("Session error: %s", e)
raise
finally:
session.close()
Observations
I can visually see the transaction hanging if i execute select * from pg_catalog.pg_stat_activity psa
Printing the identical query and then executing it inside the database directly (i.e. dbeaver) correctly returns the results
None of the timeouts mentioned in the Postgres documentation do anything to break out of the hang
Adding a SELECT 1; statement works, but setting pool_pre_ping=True in the engine doesn't, which confuses me as they do the same thing to my understanding.

Django Calculated Running Balance

I am working on a Personal Finance App using Django (Creating an API Based Backend). I have a transactions table where I need to maintain the running balance based on the UserID, AccountID, TransactionDate. After Creating, Updating, or Deleting a Transaction I need to update the running balance. The current method I have identified to do this is by running a custom SQL statement which I call once one of the above operations has been done to update this field (see code below):
def update_transactions_running_balance(**kwargs):
from django.db import connection
querystring = '''UPDATE transactions_transactions
SET
transaction_running_balance = running_balance_calc.calc_running_balance
FROM
(SELECT
transaction_id,
transaction_user,
transaction_account,
SUM(transaction_amount_account_currency) OVER (ORDER BY transaction_user, transaction_account, transaction_date ASC, transaction_id ASC) as calc_running_balance
FROM
transactions_transactions
WHERE
transaction_user = {}
AND
transaction_account = {}) as running_balance_calc
WHERE
transactions_transactions.transaction_id = running_balance_calc.transaction_id
AND
transactions_transactions.transaction_user = running_balance_calc.transaction_user
AND
transactions_transactions.transaction_account = running_balance_calc.transaction_account'''.format(int(kwargs['transaction_user']), int(kwargs['transaction_account']))
with connection.cursor() as cursor:
cursor.execute(querystring)
However the issue I have is once the table gets a little larger, the response time started to increase (The SELECT statement is where the time is taken). The other issue I have is when I load the server with multiple concurrent create transactions, every once in a while (current rate is 0.25%) the running balance update fails due to the following error:
ERROR:  deadlock detected.
Process 7038 waits for ShareLock on transaction 5549; blocked by process 7036.
Process 7036 waits for ShareLock on transaction 5551; blocked by process 7038.
I was wondering if there was a better way to do this? I originally wanted to define the running balance field as calculated field in the Django Model but I couldn't figure out how to define the it so that it achieves the same result as the code above. Any help would be appreciated.
Thanks.

Why don't simultaneous updates to the same record in sqlalchemy fail?

(Sorry in advance for the long question. I tried to break it up into sections to make it clearer what I'm asking. Please let me know if I should add anything else or reorganize it at all.)
Background:
I'm writing a web crawler that uses a producer/consumer model with jobs (pages to crawl or re-crawl) stored in a postgresql database table called crawler_table. I'm using SQLAlchemy to access and make changes to the database table. The exact schema is not important for this question. The important thing is that I (will) have multiple consumers, each of which repeatedly selects a record from the table, loads the page with phantomjs, and then writes information about the page back to the record.
It can happen on occasion that two consumers select the same job. This is not itself a problem; however, it is important that if they update the record with their results simultaneously, that they make consistent changes. It's good enough for me to just find out if an update would cause the record to become inconsistent. If so, I can deal with it.
Investigation:
I initially assumed that if two transactions in separate sessions read then updated the same record simultaneously, the second one to commit would fail. To test that assumption, I ran the following code (simplified slightly):
SQLAlchemySession = sessionmaker(bind=create_engine(my_postgresql_uri))
class Session (object):
# A simple wrapper for use with `with` statement
def __enter__ (self):
self.session = SQLAlchemySession()
return self.session
def __exit__ (self, exc_type, exc_val, exc_tb):
if exc_type:
self.session.rollback()
else:
self.session.commit()
self.session.close()
with Session() as session: # Create a record to play with
if session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url').count() == 0:
session.add(CrawlerPage(website='website', url='url',
first_seen=datetime.utcnow()))
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
page.failed_count = 0
# commit
# Actual experiment:
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (session)', page.failed_count
# 0 (expected)
page.failed_count += 5
with Session() as other_session:
same_page = other_session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (other_session)', same_page.failed_count
# 0 (expected)
same_page.failed_count += 10
print 'final (other_session)', same_page.failed_count
# 10 (expected)
# commit other_session, no errors (expected)
print 'final (session)', page.failed_count
# 5 (expected)
# commit session, no errors (why?)
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'final value', page.failed_count
# 5 (expected, given that there were no errors)
(Apparently Incorrect) Expectations:
I would have expected that reading a value from a record then updating that value within the same transaction would:
Be an atomic operation. That is, either succeed completely or fail completely. This much appears to be true, since the final value is 5, the value set in the last transaction to be committed.
Fail if the record being updated is updated by a concurrent session (other_session) upon attempting to commit the transaction. My rationale is that all transactions should behave as though they are performed independently in order of commit whenever possible, or should fail to commit. In these circumstances, the two transactions read then update the same value of the same record. In a version-control system, this would be the equivalent of a merge conflict. Obviously databases are not the same as version-control systems, but they have enough similarities to inform some of my assumptions about them, for better or worse.
Questions:
Why doesn't the second commit raise an exception?
Am I misunderstanding something about how SQLAlchemy handles transactions?
Am I misunderstanding something about how postgresql handles transactions? (This one seems most likely to me.)
Something else?
Is there a way to get the second commit to raise an exception?
PostgreSQL has select . . . for update, which SQLAlchemy seems to support.
My rationale is that all transactions should behave as though they are
performed independently in order of commit whenever possible, or
should fail to commit.
Well, in general there's a lot more to transactions than that. PostgreSQL's default transaction isolation level is "read committed". Loosely speaking, that means multiple transactions can simultaneously read committed values from the same rows in a table. If you want to prevent that, set transaction isolation serializable (might not work), or select...for update, or lock the table, or use a column-by-column WHERE clause, or whatever.
You can test and demonstrate transaction behavior by opening two psql connections.
begin transaction; begin transaction;
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(1 row)
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(waiting)
update test
set date = '2014-10-31'
where pid = 1
and date = '2014-10-01';
commit;
-- Locks released. SELECT for update fails.
(0 rows)

use try/except with psycopg2 or "with closing"?

I'm using Psycopg2 in Python to access a PostgreSQL database. I'm curious if it's safe to use the with closing() pattern to create and use a cursor, or if I should use an explicit try/except wrapped around the query. My question is concerning inserting or updating, and transactions.
As I understand it, all Psycopg2 queries occur within a transaction, and it's up to calling code to commit or rollback the transaction. If within a with closing(... block an error occurs, is a rollback issued? In older versions of Psycopg2, a rollback was explicitly issued on close() but this is not the case anymore (see http://initd.org/psycopg/docs/connection.html#connection.close).
My question might make more sense with an example. Here's an example using with closing(...
with closing(db.cursor()) as cursor:
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
What happens when module.raise_unexpected_error() raises its error? Is the transaction rolled back? As I understand transactions, I either need to commit them or roll them back. So in this case, what happens?
Alternately I could write my query like this:
cursor = None
try:
cursor = db.cursor()
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
except BaseException:
if cursor is not None:
cursor.rollback()
finally:
if cursor is not None:
cursor.close()
Also I should mention that I have no idea if Psycopg2's connection class cursor() method could raise an error or not (the documentation doesn't say) so better safe than sorry, no?
Which method of issuing a query and managing a transaction should I use?
Your link to the Psycopg2 docs kind of explains it itself, no?
... Note that closing a connection without committing the changes first will
cause any pending change to be discarded as if a ROLLBACK was
performed (unless a different isolation level has been selected: see
set_isolation_level()).
Changed in version 2.2: previously an explicit ROLLBACK was issued by
Psycopg on close(). The command could have been sent to the backend at
an inappropriate time, so Psycopg currently relies on the backend to
implicitly discard uncommitted changes. Some middleware are known to
behave incorrectly though when the connection is closed during a
transaction (when status is STATUS_IN_TRANSACTION), e.g. PgBouncer
reports an unclean server and discards the connection. To avoid this
problem you can ensure to terminate the transaction with a
commit()/rollback() before closing.
So, unless you're using a different isolation level, or using PgBouncer, your first example should work fine. However, if you desire some finer-grained control over exactly what happens during a transaction, then the try/except method might be best, since it parallels the database transaction state itself.

Execute some code when an SQLAlchemy object's deletion is actually committed

I have a SQLAlchemy model that represents a file and thus contains the path to an actual file. Since deletion of the database row and file should go along (so no orphaned files are left and no rows point to deleted files) I added a delete() method to my model class:
def delete(self):
if os.path.exists(self.path):
os.remove(self.path)
db.session.delete(self)
This works fine but has one huge disadvantage: The file is deleted immediately before the transaction containing the database deletion is committed.
One option would be committing in the delete() method - but I don't want to do this since I might not be finished with the current transaction. So I'm looking for a way to delay the deletion of the physical file until the transaction deleting the row is actually committed.
SQLAlchemy has an after_delete event but according to the docs this is triggered when the SQL is emitted (i.e. on flush) which is too early. It also has an after_commit event but at this point everything deleted in the transaction has probably been deleted from SA.
When using SQLAlchemy in a Flask app with Flask-SQLAlchemy it provides a models_committed signal which receives a list of (model, operation) tuples. Using this signal doing what I'm looking for is extremely easy:
#models_committed.connect_via(app)
def on_models_committed(sender, changes):
for obj, change in changes:
if change == 'delete' and hasattr(obj, '__commit_delete__'):
obj.__commit_delete__()
With this generic function every model that needs on-delete-commit code now simply needs to have a method __commit_delete__(self) and do whatever it needs to do in that method.
It can also be done without Flask-SQLAlchemy, however, in this case it needs some more code:
A deletion needs to be recorded when it's performed. This is be done using the after_delete event.
Any recorded deletions need to be handled when a COMMIT is successful. This is done using the after_commit event.
In case the transaction fails or is manually rolled back the recorded changes also need to be cleared. This is done using the after_rollback() event.
This follows along with the other event-based answers, but I thought I'd post this code, since I wrote it to solve pretty much your exact problem:
The code (below) registers a SessionExtension class that accumulates all new, changed, and deleted objects as flushes occur, then clears or evaluates the queue when the session is actually committed or rolled back. For the classes which have an external file attached, I then implemented obj.after_db_new(session), obj.after_db_update(session), and/or obj.after_db_delete(session) methods which the SessionExtension invokes as appropriate; you can then populate those methods to take care of creating / saving / deleting the external files.
Note: I'm almost positive this could be rewritten in a cleaner manner using SqlAlchemy's new event system, and it has a few other flaws, but it's in production and working, so I haven't updated it :)
import logging; log = logging.getLogger(__name__)
from sqlalchemy.orm.session import SessionExtension
class TrackerExtension(SessionExtension):
def __init__(self):
self.new = set()
self.deleted = set()
self.dirty = set()
def after_flush(self, session, flush_context):
# NOTE: requires >= SA 0.5
self.new.update(obj for obj in session.new
if hasattr(obj, "after_db_new"))
self.deleted.update(obj for obj in session.deleted
if hasattr(obj, "after_db_delete"))
self.dirty.update(obj for obj in session.dirty
if hasattr(obj, "after_db_update"))
def after_commit(self, session):
# NOTE: this is rather hackneyed, in that it hides errors until
# the end, just so it can commit as many objects as possible.
# FIXME: could integrate this w/ twophase to make everything safer in case the methods fail.
log.debug("after commit: new=%r deleted=%r dirty=%r",
self.new, self.deleted, self.dirty)
ecount = 0
if self.new:
for obj in self.new:
try:
obj.after_db_new(session)
except:
ecount += 1
log.critical("error occurred in after_db_new: obj=%r",
obj, exc_info=True)
self.new.clear()
if self.deleted:
for obj in self.deleted:
try:
obj.after_db_delete(session)
except:
ecount += 1
log.critical("error occurred in after_db_delete: obj=%r",
obj, exc_info=True)
self.deleted.clear()
if self.dirty:
for obj in self.dirty:
try:
obj.after_db_update(session)
except:
ecount += 1
log.critical("error occurred in after_db_update: obj=%r",
obj, exc_info=True)
self.dirty.clear()
if ecount:
raise RuntimeError("%r object error during after_commit() ... "
"see traceback for more" % ecount)
def after_rollback(self, session):
self.new.clear()
self.deleted.clear()
self.dirty.clear()
# then add "extension=TrackerExtension()" to the Session constructor
this seems to be a bit challenging, Im curious if a sql trigger AFTER DELETE might be the best route for this, granted it won't be dry and Im not sure the sql database you are using supports it, still AFAIK sqlalchemy pushes transactions to the db but it really doesn't know when they have being committed, if Im interpreting this comment correctly:
its the database server itself that maintains all "pending" data in an ongoing transaction. The changes aren't persisted permanently to disk, and revealed publically to other transactions, until the database receives a COMMIT command which is what Session.commit() sends.
taken from SQLAlchemy: What's the difference between flush() and commit()? by the creator of sqlalchemy ...
If your SQLAlchemy backend supports it, enable two-phase commit. You will need to use (or write) a transaction model for the filesystem that:
checks permissions, etc. to ensure that the file exists and can be deleted during the first commit phase
actually deletes the file during the second commit phase.
That's probably as good as it's going to get. Unix filesystems, as far as I know, do not natively support XA or other two-phase transactional systems, so you will have to live with the small exposure from having a second-phase filesystem delete fail unexpectedly.

Categories

Resources