We've been experimenting with sqlalchemy's disconnect handling, and how it integrates with ORM. We've studied the docs, and the advice seems to be to catch the disconnect exception, issue a rollback() and retry the code.
eg:
import sqlalchemy as SA
retry = 2
while retry:
retry -= 1
try:
for name in session.query(Names):
print name
break
except SA.exc.DBAPIError as exc:
if retry and exc.connection_invalidated:
session.rollback()
else:
raise
I follow the rationale -- you have to rollback any active transactions and replay them to ensure a consistent ordering of your actions.
BUT -- this means a lot of extra code added to every function that wants to work with data. Furthermore, in the case of SELECT, we're not modifying data and the concept of rollback/re-request is not only unsightly, but a violation of the principle of DRY (don't repeat yourself).
I was wondering if others would mind sharing how they handle disconnects with sqlalchemy.
FYI: we're using sqlalchemy 0.9.8 and Postgres 9.2.9
The way I like to approach this is place all my database code in a lambda or closure, and pass that into a helper function that will handle catching the disconnect exception, and retrying.
So with your example:
import sqlalchemy as SA
def main():
def query():
for name in session.query(Names):
print name
run_query(query)
def run_query(f, attempts=2):
while attempts > 0:
attempts -= 1
try:
return f() # "break" if query was successful and return any results
except SA.exc.DBAPIError as exc:
if attempts > 0 and exc.connection_invalidated:
session.rollback()
else:
raise
You can make this more fancy by passing a boolean into run_query to handle the case where you are only doing a read, and therefore want to retry without rolling back.
This helps you satisfy the DRY principle since all the ugly boiler-plate code for managing retries + rollbacks is placed in one location.
Using exponential backoff (https://github.com/litl/backoff):
#backoff.on_exception(
backoff.expo,
sqlalchemy.exc.DBAPIError,
factor=7,
max_tries=3,
on_backoff=lambda details: LocalSession.get_main_sql_session().rollback(),
on_giveup=lambda details: LocalSession.get_main_sql_session().flush(), # flush the session
logger=logging
)
def pessimistic_insertion(document_metadata):
LocalSession.get_main_sql_session().add(document_metadata)
LocalSession.get_main_sql_session().commit()
Assuming that LocalSession.get_main_sql_session() returns a singleton.
Related
I'm using the sqlalchemy-aurora-data-api to connect to aurora-postgresql-serverless, with SQLalchemy as an ORM.
For the most part, this has been working fine, but I keep hitting unexpected errors from the aurora_data_api (which sqlalchemy-aurora-data-api is built upon) during commits.
I've tried to handle this in the application logic by catching the exception and re-trying, however, this is still failing:
from aurora_data_api.exceptions import DatabaseError
from botocore.exceptions import ClientError
def handle_invalid_transaction_id(func):
retries = 3
#wraps(func)
def inner(*args, **kwargs):
for i in range(retries):
try:
return func(*args, **kwargs)
except (DatabaseError, ClientError):
if i != retries:
# The aim here is to try and force a new transaction
# If an error occurs and retry
db.session.close()
else:
raise
return inner
And then in my models doing something like this:
class MyModel(db.Model):
#classmethod
#handle_invalid_transaction_id
def create(cls, **kwargs):
instance = cls(**kwargs)
db.session.add(instance)
db.session.commit()
db.session.close()
return kwargs
However, I keep hitting unpredictable transaction failures:
DatabaseError: (aurora_data_api.exceptions.DatabaseError) An error occurred (BadRequestException) when calling the ExecuteStatement operation: Transaction AXwQlogMJsPZgyUXCYFg9gUq4/I9FBEUy1zjMTzdZriEuBCF44s+wMX7+aAnyyJH/6arYcHxbCLW73WE8oRYsPMN17MOrqWfUdxkZRBrM/vBUfrP8FKv6Phfr6kK6o7/0mirCtRJUxDQAQPotaeP+hHj6/IOGUCaOnodt4M3015c0dAycuqhsy4= is not found [+26ms]
It is worth noting that these are not particularly long-running transactions, so I do not think that I'm hitting the transaction expiry issue that can occur with aurora-serverless as documented here.
Is there something fundamentally wrong with my approach to this or is there a better way to handle transactions failures when they occur?
Just to close this off, and in case it helps anyone else, found the issue was in the transactions that were being created by in the cursor here
I can't answer the why, but we noticed that transactions were expiring despite the fact the data successfully committed. e.g:
request 1 - creates a bunch of transactions, write data, exits.
request 2 - creates a bunch of transactions, some transaction id for request 1 fails, exits.
So yeah, I don't think the issue is with the aurora-data-api, but somehow to do with transaction mgmt in general in aurora-serverless. In the end, we forked the repo and refactored so that everything is handled with ExecuteStatment calls rather than using transactions. It's been working fine so far (note we're using SQLalchemy so transactions are handled at the ORM level anyway).
I know I could return, but I'm wondering if there's something else, especially for helper methods where the task where return None would force the caller to add boilerplate checking at each invocation.
I found InvalidTaskError, but no real documentation - is this an internal thing? Is it appropriate to raise this?
I was looking for something like a self.abort() similar to the self.retry(), but didn't see anything.
Here's an example where I'd use it.
def helper(task, arg):
if unrecoverable_problems(arg):
# abort the task
raise InvalidTaskError()
#task(bind=True)
task_a(self, arg):
helper(task=self, arg=arg)
do_a(arg)
#task(bind=True)
task_b(self, arg):
helper(task=self, arg=arg)
do_b(arg)
After doing more digging, I found an example using Reject;
(copied from doc page)
The task may raise Reject to reject the task message using AMQPs basic_reject method. This will not have any effect unless Task.acks_late is enabled.
Rejecting a message has the same effect as acking it, but some brokers may implement additional functionality that can be used. For example RabbitMQ supports the concept of Dead Letter Exchanges where a queue can be configured to use a dead letter exchange that rejected messages are redelivered to.
Reject can also be used to requeue messages, but please be very careful when using this as it can easily result in an infinite message loop.
Example using reject when a task causes an out of memory condition:
import errno
from celery.exceptions import Reject
#app.task(bind=True, acks_late=True)
def render_scene(self, path):
file = get_file(path)
try:
renderer.render_scene(file)
# if the file is too big to fit in memory
# we reject it so that it's redelivered to the dead letter exchange
# and we can manually inspect the situation.
except MemoryError as exc:
raise Reject(exc, requeue=False)
except OSError as exc:
if exc.errno == errno.ENOMEM:
raise Reject(exc, requeue=False)
# For any other error we retry after 10 seconds.
except Exception as exc:
raise self.retry(exc, countdown=10)
I have the following (paraphrased) code that's subject to race conditions:
def calculate_and_cache(template, template_response):
# run a fairly slow and intensive calculation:
calculated_object = calculate_slowly(template, template_response)
cached_calculation = Calculation(calculated=calculated_object,
template=template,
template_response=template_response)
# try to save the calculation just computed:
try:
cached_calculation.save()
return cached_calculation
# if another thread beat you to saving this, catch the exception
# but return the object that was just calculated
except DatabaseError as error:
log(error)
return cached_calculation
And it's raising a DatabaseTransactionError:
TransactionManagementError: An error occurred in the current transaction.
You can't execute queries until the end of the 'atomic' block.
The docs have this to say about DTE's:
When exiting an atomic block, Django looks at whether it’s exited normally or with an exception to determine whether to commit or roll back.... If you attempt to run database queries before the rollback happens, Django will raise a TransactionManagementError.
But they also have this, much more vague thing to say about them as well:
TransactionManagementError is raised for any and all problems related to database transactions.
My questions, in order of ascending generality:
Will catching a DatabaseError actually address the race condition by letting the save() exit gracefully while still returning the object?
Where does the atomic block begin in the above code and where does it end?
What am I doing wrong and how can I fix it?
The Django docs on controlling transactions explicitly have an example of catching exceptions in atomic blocks.
In your case, you don't appear to be using the atomic decorator at all, so first you need to add the required import.
from django.db import transaction
Then you need to move the code that could raise a database error into an atomic block:
try:
with transaction.atomic():
cached_calculation.save()
return cached_calculation
# if another thread beat you to saving this, catch the exception
# but return the object that was just calculated
except DatabaseError as error:
log(error)
return cached_calculation
Using Django on a MySQL database I get the following error:
OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
The fault rises in the following code:
start_time = 1422086855
end_time = 1422088657
self.model.objects.filter(
user=self.user,
timestamp__gte=start_time,
timestamp__lte=end_time).delete()
for sample in samples:
o = self.model(user=self.user)
o.timestamp = sample.timestamp
...
o.save()
I have several parallell processes working on the same database and sometimes they might have the same job or an overlap in sample data. That's why I need to clear the database and then store the new samples since I don't want any duplicates.
I'm running the whole thing in a transaction block with transaction.commit_on_success() and am getting the OperationalError exception quite often. What I'd prefer is that the transaction doesn't end up in a deadlock, but instead just locks and waits for the other process to be finished with its work.
From what I've read I should order the locks correctly, but I'm not sure how to do this in Django.
What is the easiest way to ensure that I'm not getting this error while still making sure that I don't lose any data?
Use select_for_update() method:
samples = self.model.objects.select_for_update().filter(
user=self.user,
timestamp__gte=start_time,
timestamp__lte=end_time)
for sample in samples:
# do something with a sample
sample.save()
Note that you shouldn't delete selected samples and create new ones. Just update the filtered records. Lock for these records will be released then your transaction will be committed.
BTW instead of __gte/__lte lookups you can use __range:
samples = self.model.objects.select_for_update().filter(
user=self.user,
timestamp__range=(start_time, end_time))
To avoid deadlocks, what I did was implement a way of retrying a query in case a deadlock happens.
In order to do this, what I did was I monkey patched the method "execute" of django's CursorWrapper class. This method is called whenever a query is made, so it will work across the entire ORM and you won't have to worry about deadlocks across your project:
import django.db.backends.utils
from django.db import OperationalError
import time
original = django.db.backends.utils.CursorWrapper.execute
def execute_wrapper(*args, **kwargs):
attempts = 0
while attempts < 3:
try:
return original(*args, **kwargs)
except OperationalError as e:
code = e.args[0]
if attempts == 2 or code != 1213:
raise e
attempts += 1
time.sleep(0.2)
django.db.backends.utils.CursorWrapper.execute = execute_wrapper
What the code above does is: it will try running the query and if an OperationalError is thrown with the error code 1213 (a deadlock), it will wait for 200 ms and try again. It will do this 3 times and if after 3 times the problem was not solved, the original exception is raised.
This code should be executed when the django project is being loaded into memory and so a good place to put it is in the __ini__.py file of any of your apps (I placed in the __ini__.py file of my project's main directory - the one that has the same name as your django project).
Hope this helps anyone in the future.
I have functions in which I am doing database operations. I want to do something special,before trying to fetch data from database, I want to check whether cursor object is null and whether connection is dropped due to time out. How can I do pre checking in the Python ?
my functions, foo, bar :
Class A:
connect = None
cursor = None
def connect(self)
self.connect = MySQLdb.connect ( ... )
self.cursor = self.connect.cursor()
def foo (self) :
self.cursor.execute("SELECT * FROM car_models");
def bar (self) :
self.cursor.execute("SELECT * FROM incomes");
The problem with checking before an operation is that the bad condition you're checking for could happen between the check and the operation.
For example, suppose you had a method is_timed_out() to check if the connection had timed out:
if not self.cursor.is_timed_out():
self.cursor.execute("SELECT * FROM incomes")
On the face of it, this looks like you've avoided the possibility of a CursorTimedOut exception from the execute call. But it's possible for you to call is_timed_out, get a False back, then the cursor times out, and then you call the execute function, and get an exception.
Yes, the chance is very small that it will happen at just the right moment. But in a server environment, a one-in-a-million chance will happen a few times a day. Bad stuff.
You have to be prepared for your operations to fail with exceptions. And once you've got exception handling in place for those problems, you don't need the pre-checks any more, because they are redundant.
You can check whether the cursor is null easily:
if cursor is None:
... do something ...
Otherwise the usual thing in Python is to "ask for forgiveness not permission": use your database connection and if it has timed out catch the exception and handle it (otherwise you might just find that it times out between your test and the point where you use it).