Does begin_nested() automatically rollback/commit? - python

When begin_nested is used as a context manager, e.g.
with db.session.begin_nested:
# do something
If an IntegrityError is thrown, will db.session.rollback () be called automatically? On the contrary, if no exception is thrown, will db.session.commit() be automatically called?

If a transaction, such as one from begin_nested, is used as a context manager, the transaction is commited at exit, or rolled back if there was an error in the block or during commit.
Here is the relevant source: https://github.com/zzzeek/sqlalchemy/blob/81518ae2e2bc622f8cd47287a575ad4c0e43ead1/lib/sqlalchemy/orm/session.py#L558-L569

Related

SQLAclhemy, auroa-serverless invalid transaction issue on commit (aurora_data_api.exceptions.DatabaseError)

I'm using the sqlalchemy-aurora-data-api to connect to aurora-postgresql-serverless, with SQLalchemy as an ORM.
For the most part, this has been working fine, but I keep hitting unexpected errors from the aurora_data_api (which sqlalchemy-aurora-data-api is built upon) during commits.
I've tried to handle this in the application logic by catching the exception and re-trying, however, this is still failing:
from aurora_data_api.exceptions import DatabaseError
from botocore.exceptions import ClientError
def handle_invalid_transaction_id(func):
retries = 3
#wraps(func)
def inner(*args, **kwargs):
for i in range(retries):
try:
return func(*args, **kwargs)
except (DatabaseError, ClientError):
if i != retries:
# The aim here is to try and force a new transaction
# If an error occurs and retry
db.session.close()
else:
raise
return inner
And then in my models doing something like this:
class MyModel(db.Model):
#classmethod
#handle_invalid_transaction_id
def create(cls, **kwargs):
instance = cls(**kwargs)
db.session.add(instance)
db.session.commit()
db.session.close()
return kwargs
However, I keep hitting unpredictable transaction failures:
DatabaseError: (aurora_data_api.exceptions.DatabaseError) An error occurred (BadRequestException) when calling the ExecuteStatement operation: Transaction AXwQlogMJsPZgyUXCYFg9gUq4/I9FBEUy1zjMTzdZriEuBCF44s+wMX7+aAnyyJH/6arYcHxbCLW73WE8oRYsPMN17MOrqWfUdxkZRBrM/vBUfrP8FKv6Phfr6kK6o7/0mirCtRJUxDQAQPotaeP+hHj6/IOGUCaOnodt4M3015c0dAycuqhsy4= is not found [+26ms]
It is worth noting that these are not particularly long-running transactions, so I do not think that I'm hitting the transaction expiry issue that can occur with aurora-serverless as documented here.
Is there something fundamentally wrong with my approach to this or is there a better way to handle transactions failures when they occur?
Just to close this off, and in case it helps anyone else, found the issue was in the transactions that were being created by in the cursor here
I can't answer the why, but we noticed that transactions were expiring despite the fact the data successfully committed. e.g:
request 1 - creates a bunch of transactions, write data, exits.
request 2 - creates a bunch of transactions, some transaction id for request 1 fails, exits.
So yeah, I don't think the issue is with the aurora-data-api, but somehow to do with transaction mgmt in general in aurora-serverless. In the end, we forked the repo and refactored so that everything is handled with ExecuteStatment calls rather than using transactions. It's been working fine so far (note we're using SQLalchemy so transactions are handled at the ORM level anyway).

How to avoid/fix django's DatabaseTransactionError

I have the following (paraphrased) code that's subject to race conditions:
def calculate_and_cache(template, template_response):
# run a fairly slow and intensive calculation:
calculated_object = calculate_slowly(template, template_response)
cached_calculation = Calculation(calculated=calculated_object,
template=template,
template_response=template_response)
# try to save the calculation just computed:
try:
cached_calculation.save()
return cached_calculation
# if another thread beat you to saving this, catch the exception
# but return the object that was just calculated
except DatabaseError as error:
log(error)
return cached_calculation
And it's raising a DatabaseTransactionError:
TransactionManagementError: An error occurred in the current transaction.
You can't execute queries until the end of the 'atomic' block.
The docs have this to say about DTE's:
When exiting an atomic block, Django looks at whether it’s exited normally or with an exception to determine whether to commit or roll back.... If you attempt to run database queries before the rollback happens, Django will raise a TransactionManagementError.
But they also have this, much more vague thing to say about them as well:
TransactionManagementError is raised for any and all problems related to database transactions.
My questions, in order of ascending generality:
Will catching a DatabaseError actually address the race condition by letting the save() exit gracefully while still returning the object?
Where does the atomic block begin in the above code and where does it end?
What am I doing wrong and how can I fix it?
The Django docs on controlling transactions explicitly have an example of catching exceptions in atomic blocks.
In your case, you don't appear to be using the atomic decorator at all, so first you need to add the required import.
from django.db import transaction
Then you need to move the code that could raise a database error into an atomic block:
try:
with transaction.atomic():
cached_calculation.save()
return cached_calculation
# if another thread beat you to saving this, catch the exception
# but return the object that was just calculated
except DatabaseError as error:
log(error)
return cached_calculation

python 'with' statement, should I use contextlib.closing?

from contextlib import closing
def init_db():
with closing(connect_db()) as db:
with app.open_resource('schema.sql') as f:
db.cursor().executescript(f.read())
db.commit()
This is from flask tutorial Step 3(http://flask.pocoo.org/docs/tutorial/dbinit/#tutorial-dbinit). And I'm little curious about the line 4 of that.
Must I import and use that 'contextlib.closing()' method?
When I've learned about with statement, many articles said that it closes file automatically after process like below.(same as Finally: thing.close())
with open('filename','w') as f:
f.write(someString);
Even though I don't use that contextlib.closing() like below, What's difference?
It's from version 2.7.6, Thank you.
def init_db():
with connect_db() as db:
with app.open_resource('schema.sql') as f:
db.cursor().executescript(f.read())
db.commit()
Yes, you should be using context.closing(); your own version does something different entirely.
The with statement lets a context manager know when a block of code is entered and exited; on exit the context manager is also given access to the exception, if one occurred. File objects use this to automatically close the file when the block is exited.
The connect_db() function from the tutorial returns a sqlite3 connection object, which can indeed be used as a context manager. However, the connection.__exit__() method doesn't close the connection, it commits the transaction on a successful completion, or aborts it when there is an exception.
The contextlib.closing() context manager on the other hand, calls the connection.close() method on the connection. This is something entirely different.
So, your second snippet may work, but does something different. The tutorial code closes the connection, your version commits a transaction. You are already calling db.commit(), so the action is actually redundant provided no exceptions are raised.
You could use the connection as a context manager again to have the automatic transaction handling behaviour:
def init_db():
with closing(connect_db()) as db:
with app.open_resource('schema.sql') as f, db:
db.cursor().executescript(f.read())
Note the , db on the second with line, ensuring that the db.__exit__() method is called when the block exits.
The only thing done by the with statement is to call __enter__ method before entering its block and __exit__ method before exiting it.
If those methods are not defined the with statement won't work as you may expect. I don't know what is the return type of connect_db, but I guess that it could be many different things from different third-party libraries. So, your code without closing will probably work in many (all?) cases, but you never know what can be returned by connect_db.

use try/except with psycopg2 or "with closing"?

I'm using Psycopg2 in Python to access a PostgreSQL database. I'm curious if it's safe to use the with closing() pattern to create and use a cursor, or if I should use an explicit try/except wrapped around the query. My question is concerning inserting or updating, and transactions.
As I understand it, all Psycopg2 queries occur within a transaction, and it's up to calling code to commit or rollback the transaction. If within a with closing(... block an error occurs, is a rollback issued? In older versions of Psycopg2, a rollback was explicitly issued on close() but this is not the case anymore (see http://initd.org/psycopg/docs/connection.html#connection.close).
My question might make more sense with an example. Here's an example using with closing(...
with closing(db.cursor()) as cursor:
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
What happens when module.raise_unexpected_error() raises its error? Is the transaction rolled back? As I understand transactions, I either need to commit them or roll them back. So in this case, what happens?
Alternately I could write my query like this:
cursor = None
try:
cursor = db.cursor()
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
except BaseException:
if cursor is not None:
cursor.rollback()
finally:
if cursor is not None:
cursor.close()
Also I should mention that I have no idea if Psycopg2's connection class cursor() method could raise an error or not (the documentation doesn't say) so better safe than sorry, no?
Which method of issuing a query and managing a transaction should I use?
Your link to the Psycopg2 docs kind of explains it itself, no?
... Note that closing a connection without committing the changes first will
cause any pending change to be discarded as if a ROLLBACK was
performed (unless a different isolation level has been selected: see
set_isolation_level()).
Changed in version 2.2: previously an explicit ROLLBACK was issued by
Psycopg on close(). The command could have been sent to the backend at
an inappropriate time, so Psycopg currently relies on the backend to
implicitly discard uncommitted changes. Some middleware are known to
behave incorrectly though when the connection is closed during a
transaction (when status is STATUS_IN_TRANSACTION), e.g. PgBouncer
reports an unclean server and discards the connection. To avoid this
problem you can ensure to terminate the transaction with a
commit()/rollback() before closing.
So, unless you're using a different isolation level, or using PgBouncer, your first example should work fine. However, if you desire some finer-grained control over exactly what happens during a transaction, then the try/except method might be best, since it parallels the database transaction state itself.

Execute some code when an SQLAlchemy object's deletion is actually committed

I have a SQLAlchemy model that represents a file and thus contains the path to an actual file. Since deletion of the database row and file should go along (so no orphaned files are left and no rows point to deleted files) I added a delete() method to my model class:
def delete(self):
if os.path.exists(self.path):
os.remove(self.path)
db.session.delete(self)
This works fine but has one huge disadvantage: The file is deleted immediately before the transaction containing the database deletion is committed.
One option would be committing in the delete() method - but I don't want to do this since I might not be finished with the current transaction. So I'm looking for a way to delay the deletion of the physical file until the transaction deleting the row is actually committed.
SQLAlchemy has an after_delete event but according to the docs this is triggered when the SQL is emitted (i.e. on flush) which is too early. It also has an after_commit event but at this point everything deleted in the transaction has probably been deleted from SA.
When using SQLAlchemy in a Flask app with Flask-SQLAlchemy it provides a models_committed signal which receives a list of (model, operation) tuples. Using this signal doing what I'm looking for is extremely easy:
#models_committed.connect_via(app)
def on_models_committed(sender, changes):
for obj, change in changes:
if change == 'delete' and hasattr(obj, '__commit_delete__'):
obj.__commit_delete__()
With this generic function every model that needs on-delete-commit code now simply needs to have a method __commit_delete__(self) and do whatever it needs to do in that method.
It can also be done without Flask-SQLAlchemy, however, in this case it needs some more code:
A deletion needs to be recorded when it's performed. This is be done using the after_delete event.
Any recorded deletions need to be handled when a COMMIT is successful. This is done using the after_commit event.
In case the transaction fails or is manually rolled back the recorded changes also need to be cleared. This is done using the after_rollback() event.
This follows along with the other event-based answers, but I thought I'd post this code, since I wrote it to solve pretty much your exact problem:
The code (below) registers a SessionExtension class that accumulates all new, changed, and deleted objects as flushes occur, then clears or evaluates the queue when the session is actually committed or rolled back. For the classes which have an external file attached, I then implemented obj.after_db_new(session), obj.after_db_update(session), and/or obj.after_db_delete(session) methods which the SessionExtension invokes as appropriate; you can then populate those methods to take care of creating / saving / deleting the external files.
Note: I'm almost positive this could be rewritten in a cleaner manner using SqlAlchemy's new event system, and it has a few other flaws, but it's in production and working, so I haven't updated it :)
import logging; log = logging.getLogger(__name__)
from sqlalchemy.orm.session import SessionExtension
class TrackerExtension(SessionExtension):
def __init__(self):
self.new = set()
self.deleted = set()
self.dirty = set()
def after_flush(self, session, flush_context):
# NOTE: requires >= SA 0.5
self.new.update(obj for obj in session.new
if hasattr(obj, "after_db_new"))
self.deleted.update(obj for obj in session.deleted
if hasattr(obj, "after_db_delete"))
self.dirty.update(obj for obj in session.dirty
if hasattr(obj, "after_db_update"))
def after_commit(self, session):
# NOTE: this is rather hackneyed, in that it hides errors until
# the end, just so it can commit as many objects as possible.
# FIXME: could integrate this w/ twophase to make everything safer in case the methods fail.
log.debug("after commit: new=%r deleted=%r dirty=%r",
self.new, self.deleted, self.dirty)
ecount = 0
if self.new:
for obj in self.new:
try:
obj.after_db_new(session)
except:
ecount += 1
log.critical("error occurred in after_db_new: obj=%r",
obj, exc_info=True)
self.new.clear()
if self.deleted:
for obj in self.deleted:
try:
obj.after_db_delete(session)
except:
ecount += 1
log.critical("error occurred in after_db_delete: obj=%r",
obj, exc_info=True)
self.deleted.clear()
if self.dirty:
for obj in self.dirty:
try:
obj.after_db_update(session)
except:
ecount += 1
log.critical("error occurred in after_db_update: obj=%r",
obj, exc_info=True)
self.dirty.clear()
if ecount:
raise RuntimeError("%r object error during after_commit() ... "
"see traceback for more" % ecount)
def after_rollback(self, session):
self.new.clear()
self.deleted.clear()
self.dirty.clear()
# then add "extension=TrackerExtension()" to the Session constructor
this seems to be a bit challenging, Im curious if a sql trigger AFTER DELETE might be the best route for this, granted it won't be dry and Im not sure the sql database you are using supports it, still AFAIK sqlalchemy pushes transactions to the db but it really doesn't know when they have being committed, if Im interpreting this comment correctly:
its the database server itself that maintains all "pending" data in an ongoing transaction. The changes aren't persisted permanently to disk, and revealed publically to other transactions, until the database receives a COMMIT command which is what Session.commit() sends.
taken from SQLAlchemy: What's the difference between flush() and commit()? by the creator of sqlalchemy ...
If your SQLAlchemy backend supports it, enable two-phase commit. You will need to use (or write) a transaction model for the filesystem that:
checks permissions, etc. to ensure that the file exists and can be deleted during the first commit phase
actually deletes the file during the second commit phase.
That's probably as good as it's going to get. Unix filesystems, as far as I know, do not natively support XA or other two-phase transactional systems, so you will have to live with the small exposure from having a second-phase filesystem delete fail unexpectedly.

Categories

Resources