I have the following (paraphrased) code that's subject to race conditions:
def calculate_and_cache(template, template_response):
# run a fairly slow and intensive calculation:
calculated_object = calculate_slowly(template, template_response)
cached_calculation = Calculation(calculated=calculated_object,
template=template,
template_response=template_response)
# try to save the calculation just computed:
try:
cached_calculation.save()
return cached_calculation
# if another thread beat you to saving this, catch the exception
# but return the object that was just calculated
except DatabaseError as error:
log(error)
return cached_calculation
And it's raising a DatabaseTransactionError:
TransactionManagementError: An error occurred in the current transaction.
You can't execute queries until the end of the 'atomic' block.
The docs have this to say about DTE's:
When exiting an atomic block, Django looks at whether it’s exited normally or with an exception to determine whether to commit or roll back.... If you attempt to run database queries before the rollback happens, Django will raise a TransactionManagementError.
But they also have this, much more vague thing to say about them as well:
TransactionManagementError is raised for any and all problems related to database transactions.
My questions, in order of ascending generality:
Will catching a DatabaseError actually address the race condition by letting the save() exit gracefully while still returning the object?
Where does the atomic block begin in the above code and where does it end?
What am I doing wrong and how can I fix it?
The Django docs on controlling transactions explicitly have an example of catching exceptions in atomic blocks.
In your case, you don't appear to be using the atomic decorator at all, so first you need to add the required import.
from django.db import transaction
Then you need to move the code that could raise a database error into an atomic block:
try:
with transaction.atomic():
cached_calculation.save()
return cached_calculation
# if another thread beat you to saving this, catch the exception
# but return the object that was just calculated
except DatabaseError as error:
log(error)
return cached_calculation
Related
I'm using the sqlalchemy-aurora-data-api to connect to aurora-postgresql-serverless, with SQLalchemy as an ORM.
For the most part, this has been working fine, but I keep hitting unexpected errors from the aurora_data_api (which sqlalchemy-aurora-data-api is built upon) during commits.
I've tried to handle this in the application logic by catching the exception and re-trying, however, this is still failing:
from aurora_data_api.exceptions import DatabaseError
from botocore.exceptions import ClientError
def handle_invalid_transaction_id(func):
retries = 3
#wraps(func)
def inner(*args, **kwargs):
for i in range(retries):
try:
return func(*args, **kwargs)
except (DatabaseError, ClientError):
if i != retries:
# The aim here is to try and force a new transaction
# If an error occurs and retry
db.session.close()
else:
raise
return inner
And then in my models doing something like this:
class MyModel(db.Model):
#classmethod
#handle_invalid_transaction_id
def create(cls, **kwargs):
instance = cls(**kwargs)
db.session.add(instance)
db.session.commit()
db.session.close()
return kwargs
However, I keep hitting unpredictable transaction failures:
DatabaseError: (aurora_data_api.exceptions.DatabaseError) An error occurred (BadRequestException) when calling the ExecuteStatement operation: Transaction AXwQlogMJsPZgyUXCYFg9gUq4/I9FBEUy1zjMTzdZriEuBCF44s+wMX7+aAnyyJH/6arYcHxbCLW73WE8oRYsPMN17MOrqWfUdxkZRBrM/vBUfrP8FKv6Phfr6kK6o7/0mirCtRJUxDQAQPotaeP+hHj6/IOGUCaOnodt4M3015c0dAycuqhsy4= is not found [+26ms]
It is worth noting that these are not particularly long-running transactions, so I do not think that I'm hitting the transaction expiry issue that can occur with aurora-serverless as documented here.
Is there something fundamentally wrong with my approach to this or is there a better way to handle transactions failures when they occur?
Just to close this off, and in case it helps anyone else, found the issue was in the transactions that were being created by in the cursor here
I can't answer the why, but we noticed that transactions were expiring despite the fact the data successfully committed. e.g:
request 1 - creates a bunch of transactions, write data, exits.
request 2 - creates a bunch of transactions, some transaction id for request 1 fails, exits.
So yeah, I don't think the issue is with the aurora-data-api, but somehow to do with transaction mgmt in general in aurora-serverless. In the end, we forked the repo and refactored so that everything is handled with ExecuteStatment calls rather than using transactions. It's been working fine so far (note we're using SQLalchemy so transactions are handled at the ORM level anyway).
We've been experimenting with sqlalchemy's disconnect handling, and how it integrates with ORM. We've studied the docs, and the advice seems to be to catch the disconnect exception, issue a rollback() and retry the code.
eg:
import sqlalchemy as SA
retry = 2
while retry:
retry -= 1
try:
for name in session.query(Names):
print name
break
except SA.exc.DBAPIError as exc:
if retry and exc.connection_invalidated:
session.rollback()
else:
raise
I follow the rationale -- you have to rollback any active transactions and replay them to ensure a consistent ordering of your actions.
BUT -- this means a lot of extra code added to every function that wants to work with data. Furthermore, in the case of SELECT, we're not modifying data and the concept of rollback/re-request is not only unsightly, but a violation of the principle of DRY (don't repeat yourself).
I was wondering if others would mind sharing how they handle disconnects with sqlalchemy.
FYI: we're using sqlalchemy 0.9.8 and Postgres 9.2.9
The way I like to approach this is place all my database code in a lambda or closure, and pass that into a helper function that will handle catching the disconnect exception, and retrying.
So with your example:
import sqlalchemy as SA
def main():
def query():
for name in session.query(Names):
print name
run_query(query)
def run_query(f, attempts=2):
while attempts > 0:
attempts -= 1
try:
return f() # "break" if query was successful and return any results
except SA.exc.DBAPIError as exc:
if attempts > 0 and exc.connection_invalidated:
session.rollback()
else:
raise
You can make this more fancy by passing a boolean into run_query to handle the case where you are only doing a read, and therefore want to retry without rolling back.
This helps you satisfy the DRY principle since all the ugly boiler-plate code for managing retries + rollbacks is placed in one location.
Using exponential backoff (https://github.com/litl/backoff):
#backoff.on_exception(
backoff.expo,
sqlalchemy.exc.DBAPIError,
factor=7,
max_tries=3,
on_backoff=lambda details: LocalSession.get_main_sql_session().rollback(),
on_giveup=lambda details: LocalSession.get_main_sql_session().flush(), # flush the session
logger=logging
)
def pessimistic_insertion(document_metadata):
LocalSession.get_main_sql_session().add(document_metadata)
LocalSession.get_main_sql_session().commit()
Assuming that LocalSession.get_main_sql_session() returns a singleton.
Using Django on a MySQL database I get the following error:
OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
The fault rises in the following code:
start_time = 1422086855
end_time = 1422088657
self.model.objects.filter(
user=self.user,
timestamp__gte=start_time,
timestamp__lte=end_time).delete()
for sample in samples:
o = self.model(user=self.user)
o.timestamp = sample.timestamp
...
o.save()
I have several parallell processes working on the same database and sometimes they might have the same job or an overlap in sample data. That's why I need to clear the database and then store the new samples since I don't want any duplicates.
I'm running the whole thing in a transaction block with transaction.commit_on_success() and am getting the OperationalError exception quite often. What I'd prefer is that the transaction doesn't end up in a deadlock, but instead just locks and waits for the other process to be finished with its work.
From what I've read I should order the locks correctly, but I'm not sure how to do this in Django.
What is the easiest way to ensure that I'm not getting this error while still making sure that I don't lose any data?
Use select_for_update() method:
samples = self.model.objects.select_for_update().filter(
user=self.user,
timestamp__gte=start_time,
timestamp__lte=end_time)
for sample in samples:
# do something with a sample
sample.save()
Note that you shouldn't delete selected samples and create new ones. Just update the filtered records. Lock for these records will be released then your transaction will be committed.
BTW instead of __gte/__lte lookups you can use __range:
samples = self.model.objects.select_for_update().filter(
user=self.user,
timestamp__range=(start_time, end_time))
To avoid deadlocks, what I did was implement a way of retrying a query in case a deadlock happens.
In order to do this, what I did was I monkey patched the method "execute" of django's CursorWrapper class. This method is called whenever a query is made, so it will work across the entire ORM and you won't have to worry about deadlocks across your project:
import django.db.backends.utils
from django.db import OperationalError
import time
original = django.db.backends.utils.CursorWrapper.execute
def execute_wrapper(*args, **kwargs):
attempts = 0
while attempts < 3:
try:
return original(*args, **kwargs)
except OperationalError as e:
code = e.args[0]
if attempts == 2 or code != 1213:
raise e
attempts += 1
time.sleep(0.2)
django.db.backends.utils.CursorWrapper.execute = execute_wrapper
What the code above does is: it will try running the query and if an OperationalError is thrown with the error code 1213 (a deadlock), it will wait for 200 ms and try again. It will do this 3 times and if after 3 times the problem was not solved, the original exception is raised.
This code should be executed when the django project is being loaded into memory and so a good place to put it is in the __ini__.py file of any of your apps (I placed in the __ini__.py file of my project's main directory - the one that has the same name as your django project).
Hope this helps anyone in the future.
When begin_nested is used as a context manager, e.g.
with db.session.begin_nested:
# do something
If an IntegrityError is thrown, will db.session.rollback () be called automatically? On the contrary, if no exception is thrown, will db.session.commit() be automatically called?
If a transaction, such as one from begin_nested, is used as a context manager, the transaction is commited at exit, or rolled back if there was an error in the block or during commit.
Here is the relevant source: https://github.com/zzzeek/sqlalchemy/blob/81518ae2e2bc622f8cd47287a575ad4c0e43ead1/lib/sqlalchemy/orm/session.py#L558-L569
I want to know what is the best way of checking an condition in Python definition and prevent it from further execution if condition is not satisfied. Right now i am following the below mentioned scheme but it actually prints the whole trace stack. I want it to print only an error message and do not execute the rest of code. Is there any other cleaner solution for doing it.
def Mydef(n1,n2):
if (n1>n2):
raise ValueError("Arg1 should be less than Arg2)
# Some Code
Mydef(2,1)
That is what exceptions are created for. Your scheme of raising exception is good in general; you just need to add some code to catch it and process it
try:
Mydef(2,1)
except ValueError, e:
# Do some stuff when exception is raised, e.message will contain your message
In this case, execution of Mydef stops when it encounters raise ValueError line of code, and goes to the code block under except.
You can read more about exceptions processing in the documentation.
If you don't want to deal with exceptions processing, you can gracefully stop function to execute further code with return statement.
def Mydef(n1,n2):
if (n1>n2):
return
def Mydef(n1,n2):
if (n1>n2):
print "Arg1 should be less than Arg2"
return None
# Some Code
Mydef(2,1)
Functions stop executing when they reach to return statement or they run the until the end of definition. You should read about flow control in general (not specifically to python)