I have a SQLAlchemy model that represents a file and thus contains the path to an actual file. Since deletion of the database row and file should go along (so no orphaned files are left and no rows point to deleted files) I added a delete() method to my model class:
def delete(self):
if os.path.exists(self.path):
os.remove(self.path)
db.session.delete(self)
This works fine but has one huge disadvantage: The file is deleted immediately before the transaction containing the database deletion is committed.
One option would be committing in the delete() method - but I don't want to do this since I might not be finished with the current transaction. So I'm looking for a way to delay the deletion of the physical file until the transaction deleting the row is actually committed.
SQLAlchemy has an after_delete event but according to the docs this is triggered when the SQL is emitted (i.e. on flush) which is too early. It also has an after_commit event but at this point everything deleted in the transaction has probably been deleted from SA.
When using SQLAlchemy in a Flask app with Flask-SQLAlchemy it provides a models_committed signal which receives a list of (model, operation) tuples. Using this signal doing what I'm looking for is extremely easy:
#models_committed.connect_via(app)
def on_models_committed(sender, changes):
for obj, change in changes:
if change == 'delete' and hasattr(obj, '__commit_delete__'):
obj.__commit_delete__()
With this generic function every model that needs on-delete-commit code now simply needs to have a method __commit_delete__(self) and do whatever it needs to do in that method.
It can also be done without Flask-SQLAlchemy, however, in this case it needs some more code:
A deletion needs to be recorded when it's performed. This is be done using the after_delete event.
Any recorded deletions need to be handled when a COMMIT is successful. This is done using the after_commit event.
In case the transaction fails or is manually rolled back the recorded changes also need to be cleared. This is done using the after_rollback() event.
This follows along with the other event-based answers, but I thought I'd post this code, since I wrote it to solve pretty much your exact problem:
The code (below) registers a SessionExtension class that accumulates all new, changed, and deleted objects as flushes occur, then clears or evaluates the queue when the session is actually committed or rolled back. For the classes which have an external file attached, I then implemented obj.after_db_new(session), obj.after_db_update(session), and/or obj.after_db_delete(session) methods which the SessionExtension invokes as appropriate; you can then populate those methods to take care of creating / saving / deleting the external files.
Note: I'm almost positive this could be rewritten in a cleaner manner using SqlAlchemy's new event system, and it has a few other flaws, but it's in production and working, so I haven't updated it :)
import logging; log = logging.getLogger(__name__)
from sqlalchemy.orm.session import SessionExtension
class TrackerExtension(SessionExtension):
def __init__(self):
self.new = set()
self.deleted = set()
self.dirty = set()
def after_flush(self, session, flush_context):
# NOTE: requires >= SA 0.5
self.new.update(obj for obj in session.new
if hasattr(obj, "after_db_new"))
self.deleted.update(obj for obj in session.deleted
if hasattr(obj, "after_db_delete"))
self.dirty.update(obj for obj in session.dirty
if hasattr(obj, "after_db_update"))
def after_commit(self, session):
# NOTE: this is rather hackneyed, in that it hides errors until
# the end, just so it can commit as many objects as possible.
# FIXME: could integrate this w/ twophase to make everything safer in case the methods fail.
log.debug("after commit: new=%r deleted=%r dirty=%r",
self.new, self.deleted, self.dirty)
ecount = 0
if self.new:
for obj in self.new:
try:
obj.after_db_new(session)
except:
ecount += 1
log.critical("error occurred in after_db_new: obj=%r",
obj, exc_info=True)
self.new.clear()
if self.deleted:
for obj in self.deleted:
try:
obj.after_db_delete(session)
except:
ecount += 1
log.critical("error occurred in after_db_delete: obj=%r",
obj, exc_info=True)
self.deleted.clear()
if self.dirty:
for obj in self.dirty:
try:
obj.after_db_update(session)
except:
ecount += 1
log.critical("error occurred in after_db_update: obj=%r",
obj, exc_info=True)
self.dirty.clear()
if ecount:
raise RuntimeError("%r object error during after_commit() ... "
"see traceback for more" % ecount)
def after_rollback(self, session):
self.new.clear()
self.deleted.clear()
self.dirty.clear()
# then add "extension=TrackerExtension()" to the Session constructor
this seems to be a bit challenging, Im curious if a sql trigger AFTER DELETE might be the best route for this, granted it won't be dry and Im not sure the sql database you are using supports it, still AFAIK sqlalchemy pushes transactions to the db but it really doesn't know when they have being committed, if Im interpreting this comment correctly:
its the database server itself that maintains all "pending" data in an ongoing transaction. The changes aren't persisted permanently to disk, and revealed publically to other transactions, until the database receives a COMMIT command which is what Session.commit() sends.
taken from SQLAlchemy: What's the difference between flush() and commit()? by the creator of sqlalchemy ...
If your SQLAlchemy backend supports it, enable two-phase commit. You will need to use (or write) a transaction model for the filesystem that:
checks permissions, etc. to ensure that the file exists and can be deleted during the first commit phase
actually deletes the file during the second commit phase.
That's probably as good as it's going to get. Unix filesystems, as far as I know, do not natively support XA or other two-phase transactional systems, so you will have to live with the small exposure from having a second-phase filesystem delete fail unexpectedly.
Related
I'm using the sqlalchemy-aurora-data-api to connect to aurora-postgresql-serverless, with SQLalchemy as an ORM.
For the most part, this has been working fine, but I keep hitting unexpected errors from the aurora_data_api (which sqlalchemy-aurora-data-api is built upon) during commits.
I've tried to handle this in the application logic by catching the exception and re-trying, however, this is still failing:
from aurora_data_api.exceptions import DatabaseError
from botocore.exceptions import ClientError
def handle_invalid_transaction_id(func):
retries = 3
#wraps(func)
def inner(*args, **kwargs):
for i in range(retries):
try:
return func(*args, **kwargs)
except (DatabaseError, ClientError):
if i != retries:
# The aim here is to try and force a new transaction
# If an error occurs and retry
db.session.close()
else:
raise
return inner
And then in my models doing something like this:
class MyModel(db.Model):
#classmethod
#handle_invalid_transaction_id
def create(cls, **kwargs):
instance = cls(**kwargs)
db.session.add(instance)
db.session.commit()
db.session.close()
return kwargs
However, I keep hitting unpredictable transaction failures:
DatabaseError: (aurora_data_api.exceptions.DatabaseError) An error occurred (BadRequestException) when calling the ExecuteStatement operation: Transaction AXwQlogMJsPZgyUXCYFg9gUq4/I9FBEUy1zjMTzdZriEuBCF44s+wMX7+aAnyyJH/6arYcHxbCLW73WE8oRYsPMN17MOrqWfUdxkZRBrM/vBUfrP8FKv6Phfr6kK6o7/0mirCtRJUxDQAQPotaeP+hHj6/IOGUCaOnodt4M3015c0dAycuqhsy4= is not found [+26ms]
It is worth noting that these are not particularly long-running transactions, so I do not think that I'm hitting the transaction expiry issue that can occur with aurora-serverless as documented here.
Is there something fundamentally wrong with my approach to this or is there a better way to handle transactions failures when they occur?
Just to close this off, and in case it helps anyone else, found the issue was in the transactions that were being created by in the cursor here
I can't answer the why, but we noticed that transactions were expiring despite the fact the data successfully committed. e.g:
request 1 - creates a bunch of transactions, write data, exits.
request 2 - creates a bunch of transactions, some transaction id for request 1 fails, exits.
So yeah, I don't think the issue is with the aurora-data-api, but somehow to do with transaction mgmt in general in aurora-serverless. In the end, we forked the repo and refactored so that everything is handled with ExecuteStatment calls rather than using transactions. It's been working fine so far (note we're using SQLalchemy so transactions are handled at the ORM level anyway).
I want to create a Database class which can create cursors on demand.
It must be possible to use the cursors in parallel (two or more cursor can coexist) and, since we can only have one cursor per connection, the Database class must handle multiple connections.
For performance reasons we want to reuse connections as much as possible and avoid creating a new connection every time a cursor is created:
whenever a request is made the class will try to find, among the opened connections, the first non-busy connection and use it.
A connection is still busy as long as the cursor has not been consumed.
Here is an example of such class:
class Database:
...
def get_cursos(self,query):
selected_connection = None
# Find usable connection
for con in self.connections:
if con.is_busy() == False: # <--- This is not PEP 249
selected_connection = con
break
# If all connections are busy, create a new one
if (selected_connection is None):
selected_connection = self._new_connection()
self.connections.append(selected_connection)
# Return cursor on query
cur = selected_connection.cursor()
cur.execute(query)
return cur
However looking at the PEP 249 standard I cannot find any way to check whether a connection is actually being used or not.
Some implementations such as MySQL Connector offer ways to check whether a connection has still unread content (see here), however as far as I know those are not part of PEP 249.
Is there a way I can achieve what described before for any PEP 249 compliant python database API ?
Perhaps you could use the status of the cursor to tell you if a cursor is being used. Let's say you had the following cursor:
new_cursor = new_connection.cursor()
cursor.execute(new_query)
and you wanted to see if that connection was available for another cursor to use. You might be able to do something like:
if (new_cursor.rowcount == -1):
another_new_cursor = new_connection.cursor()
...
Of course, all this really tells you is that the cursor hasn't executed anything yet since the last time it was closed. It could point to a cursor that is done (and therefore a connection that has been closed) or it could point to a cursor that has just been created or attached to a connection. Another option is to use a try/catch loop, something along the lines of:
try:
another_new_cursor = new_connection.cursor()
except ConnectionError?: //not actually sure which error would go here but you get the idea.
print("this connection is busy.")
Of course, you probably don't want to be spammed with printed messages but you can do whatever you want in that except block, sleep for 5 seconds, wait for some other variable to be passed, wait for user input, etc. If you are restricted to PEP 249, you are going to have to do a lot of things from scratch. Is there a reason you can't use external libraries?
EDIT: If you are willing to move outside of PEP 249, here is something that might work, but it may not be suitable for your purposes. If you make use of the mysql python library, you can take advantage of the is_connected method.
new_connection = mysql.connector.connect(host='myhost',
database='myDB',
user='me',
password='myPassword')
...stuff happens...
if (new_connection.is_connected()):
pass
else:
another_new_cursor = new_connection.cursor()
...
I am working on an asset management system for a personal project. My question is how to handle file system operations in python cleanly and efficiently so that I can rollback or undo changes if something goes wrong.
A typical operation might look something like this
try
file system operation(s)
update database
except Exceptions
undo file system operations already performed
rollback database transaction
handle exceptions
file system operations can be things like create, copy, link, and remove files/directories
My idea was to have a context manager for both the file system operations and the database management. The execution would be something like this:
# create new asset
with FileSystemCM as fs, DatabaseCM as db:
fs.create_dir(path_to_asset)
fs.create_file(path_to_a_file_this_asset_needs)
db.insert('Asset_Table', asset_name)
Now if for example db.insert fails the FileSystemCM removes the newly created file and newly created directory, and DatabaseCM rolls back the db transaction
A simple approach to my FileSystemCM implementation would be something like this:
class FileSystemCM(object):
""" File System Context Manager """
def __init__(self):
self.undo_stack = [] # list of (fn, args, kwargs)
def __enter__(self):
return self
def __exit__(self, exception_type, exception_val, traceback):
if exception_type:
# pop undo actions off the stack and execute
while self.undo_stack:
undo_fn, args, kwargs = self.undo_stack.pop()
undo_fn(*args, **kwargs)
def create_dir(self, dir_path):
create_file(dir_path)
self.undo_stack.append((remove_dir, [dir_path], {'force': True}))
def create_file(self, file_path):
create_file(file_path)
self.undo_stack.append((remove_file, [file_path], {'force': True}))
Is there a better approach to this? There are circumstances that this implementation wont handle that I could use feedback on
deleting files. My thoughts are to move files for removal to a temp location (or create a tmp hard link), if everything goes ok then remove the temp files or links, otherwise revert it back. But this can lead to the situation below.
the __exit__ code throwing an exception and not finishing the undo operations, perhaps I leave a log file so at least things can be manually cleaned up?
I meant this as a comment but its too long to fit in the comments section. Let me start by saying that this sounds like a very interesting project (at least for my taste).
Sometime ago (I can't remember where) I have read an article post about implementing Undo/Redo functionality, and what they do is maintain two separate stacks (one for undo and one for redo). When the user performs an action a pair of action/its-reverse with their arguments are pushed into the undo stack. Whenever the user performs the undo action the reverse action from the pair is executed and the pair is then moved into the redo stack, and when the redo action is performed the action from the pair gets executed and the pair is taken back into the undo stack.
Whenever the user performs a new action the redo stack is cleared. The only draw back of this approach is the irreversible actions. One way I can think of to overcome that is to use some sort of Event-Sourcing patterns where you keep the whole state of the system and its diff. This might seem very inefficient, but it is used commonly in software.
I have this django views.py method that aims to insert many data into the db. It loops through arrays of models and, if an object isn't already on the db, it gets inserted.
This is what the code looks like:
def update_my_db(request):
a_models = A_Model.objects.filter(my_flag=True)
for a_model in a_models:
b_model_array = []
[...] # this is where b_model_array gets filled
for index in range(len(b_model_array)):
current_b_model = b_model_array[index]
try:
b_model = B_Model.objects.get(my_field=current_b_model.my_field)
except (KeyError, B_Model.DoesNotExist):
b_model = B_Model.objects.create(field_1=current_b_model.field_1, field_2=current_b_model.field_2)
b_model.save()
return HttpResponse(response)
I have noticed after several tests that the db is only updated by the end of the last iteration, as if django awaits to do a batch insert to mysql.
The thing is: there is a possibility of any of the iterations raising an exception, making all the data gathered so far be discarded because of the error (already tested and confirmed it). When it comes to adding 400 new lines, raising an exception at loop #399 and discarding all the previous 398 lines would be extremely undesirable for me.
I understand that batching would be the best choice concerning performance, but this is a background routine, so I'm not worried about it.
Bottomline: is there a way to actually force django to update the database on every iteration?
If you're on Django 1.6, check this out: https://docs.djangoproject.com/en/dev/topics/db/transactions/
You're interested in the context manager part of that page:
from django.db import transaction
def viewfunc(request):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()
I have an issue with SQLAlchemy apparently committing. A rough sketch of my code:
trans = self.conn.begin()
try:
assert not self.conn.execute(my_obj.__table__.select(my_obj.id == id)).first()
self.conn.execute(my_obj.__table__.insert().values(id=id))
assert not self.conn.execute(my_obj.__table__.select(my_obj.id == id)).first()
except:
trans.rollback()
raise
I don't commit, and the second assert always fails! In other words, it seems the data is getting inserted into the database even though the code is within a transaction! Is this assessment accurate?
You're right in that changes aren't get commited to DB. But they are auto-flushed by SQLAlchemy when you perform query, in your case flush is performed on lines with asserts. So if you will not explicitly call commit you will never see these changes in DB, within real data. However, you will get them back as long as you use the same conn object.
You can pass autoflush=False to session constructor do disable this behavior.