Using Django on a MySQL database I get the following error:
OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
The fault rises in the following code:
start_time = 1422086855
end_time = 1422088657
self.model.objects.filter(
user=self.user,
timestamp__gte=start_time,
timestamp__lte=end_time).delete()
for sample in samples:
o = self.model(user=self.user)
o.timestamp = sample.timestamp
...
o.save()
I have several parallell processes working on the same database and sometimes they might have the same job or an overlap in sample data. That's why I need to clear the database and then store the new samples since I don't want any duplicates.
I'm running the whole thing in a transaction block with transaction.commit_on_success() and am getting the OperationalError exception quite often. What I'd prefer is that the transaction doesn't end up in a deadlock, but instead just locks and waits for the other process to be finished with its work.
From what I've read I should order the locks correctly, but I'm not sure how to do this in Django.
What is the easiest way to ensure that I'm not getting this error while still making sure that I don't lose any data?
Use select_for_update() method:
samples = self.model.objects.select_for_update().filter(
user=self.user,
timestamp__gte=start_time,
timestamp__lte=end_time)
for sample in samples:
# do something with a sample
sample.save()
Note that you shouldn't delete selected samples and create new ones. Just update the filtered records. Lock for these records will be released then your transaction will be committed.
BTW instead of __gte/__lte lookups you can use __range:
samples = self.model.objects.select_for_update().filter(
user=self.user,
timestamp__range=(start_time, end_time))
To avoid deadlocks, what I did was implement a way of retrying a query in case a deadlock happens.
In order to do this, what I did was I monkey patched the method "execute" of django's CursorWrapper class. This method is called whenever a query is made, so it will work across the entire ORM and you won't have to worry about deadlocks across your project:
import django.db.backends.utils
from django.db import OperationalError
import time
original = django.db.backends.utils.CursorWrapper.execute
def execute_wrapper(*args, **kwargs):
attempts = 0
while attempts < 3:
try:
return original(*args, **kwargs)
except OperationalError as e:
code = e.args[0]
if attempts == 2 or code != 1213:
raise e
attempts += 1
time.sleep(0.2)
django.db.backends.utils.CursorWrapper.execute = execute_wrapper
What the code above does is: it will try running the query and if an OperationalError is thrown with the error code 1213 (a deadlock), it will wait for 200 ms and try again. It will do this 3 times and if after 3 times the problem was not solved, the original exception is raised.
This code should be executed when the django project is being loaded into memory and so a good place to put it is in the __ini__.py file of any of your apps (I placed in the __ini__.py file of my project's main directory - the one that has the same name as your django project).
Hope this helps anyone in the future.
Related
I'm using the sqlalchemy-aurora-data-api to connect to aurora-postgresql-serverless, with SQLalchemy as an ORM.
For the most part, this has been working fine, but I keep hitting unexpected errors from the aurora_data_api (which sqlalchemy-aurora-data-api is built upon) during commits.
I've tried to handle this in the application logic by catching the exception and re-trying, however, this is still failing:
from aurora_data_api.exceptions import DatabaseError
from botocore.exceptions import ClientError
def handle_invalid_transaction_id(func):
retries = 3
#wraps(func)
def inner(*args, **kwargs):
for i in range(retries):
try:
return func(*args, **kwargs)
except (DatabaseError, ClientError):
if i != retries:
# The aim here is to try and force a new transaction
# If an error occurs and retry
db.session.close()
else:
raise
return inner
And then in my models doing something like this:
class MyModel(db.Model):
#classmethod
#handle_invalid_transaction_id
def create(cls, **kwargs):
instance = cls(**kwargs)
db.session.add(instance)
db.session.commit()
db.session.close()
return kwargs
However, I keep hitting unpredictable transaction failures:
DatabaseError: (aurora_data_api.exceptions.DatabaseError) An error occurred (BadRequestException) when calling the ExecuteStatement operation: Transaction AXwQlogMJsPZgyUXCYFg9gUq4/I9FBEUy1zjMTzdZriEuBCF44s+wMX7+aAnyyJH/6arYcHxbCLW73WE8oRYsPMN17MOrqWfUdxkZRBrM/vBUfrP8FKv6Phfr6kK6o7/0mirCtRJUxDQAQPotaeP+hHj6/IOGUCaOnodt4M3015c0dAycuqhsy4= is not found [+26ms]
It is worth noting that these are not particularly long-running transactions, so I do not think that I'm hitting the transaction expiry issue that can occur with aurora-serverless as documented here.
Is there something fundamentally wrong with my approach to this or is there a better way to handle transactions failures when they occur?
Just to close this off, and in case it helps anyone else, found the issue was in the transactions that were being created by in the cursor here
I can't answer the why, but we noticed that transactions were expiring despite the fact the data successfully committed. e.g:
request 1 - creates a bunch of transactions, write data, exits.
request 2 - creates a bunch of transactions, some transaction id for request 1 fails, exits.
So yeah, I don't think the issue is with the aurora-data-api, but somehow to do with transaction mgmt in general in aurora-serverless. In the end, we forked the repo and refactored so that everything is handled with ExecuteStatment calls rather than using transactions. It's been working fine so far (note we're using SQLalchemy so transactions are handled at the ORM level anyway).
I have a big problem with a deadlock in an InnoDB table used with sqlalchemy.
sqlalchemy.exc.InternalError: (mysql.connector.errors.InternalError) 1213 (40001): Deadlock found when trying to get lock; try restarting transaction.
I have already serialized the access, but still get a deadlock error.
This code is executed on the first call in every function. Every thread and process should wait here, till it gets the lock. It's simplified, as selectors are removed.
# The work with the index -1 always exists.
f = s.query(WorkerInProgress).with_for_update().filter(
WorkerInProgress.offset == -1).first()
I have reduced my code to a minimal state. I am currently running only concurrent calls on the method next_slice. Session handling, rollback and deadloc handling are handled outside.
I get deadlocks even all access is serialized. I did tried to increment a retry counter in the offset == -1 entity as well.
def next_slice(self, s, processgroup_id, itemcount):
f = s.query(WorkerInProgress).with_for_update().filter(
WorkerInProgress.offset == -1).first()
#Take first matching object if available / Maybe some workers failed
item = s.query(WorkerInProgress).with_for_update().filter(
WorkerInProgress.processgroup_id != processgroup_id,
WorkerInProgress.processgroup_id != 'finished',
WorkerInProgress.processgroup_id != 'finished!locked',
WorkerInProgress.offset != -1
).order_by(WorkerInProgress.offset.asc()).limit(1).first()
# *****
# Some code is missing here. as it's not executed in my testcase
# Fetch the latest item and add a new one
item = s.query(WorkerInProgress).with_for_update().order_by(
WorkerInProgress.offset.desc()).limit(1).first()
new = WorkerInProgress()
new.offset = item.offset + item.count
new.count = itemcount
new.maxtries = 3
new.processgroup_id = processgroup_id
s.add(new)
s.commit()
return new.offset, new.count
I don't understand why the deadlocks are occurring.
I have reduced deadlock by fetching all items in one query, but still get deadlocks. Perhaps someone can help me.
Finally I solved my problem. It's all in the documentation, but I have to understand it first.
Always be prepared to re-issue a transaction if it fails due to
deadlock. Deadlocks are not dangerous. Just try again.
Source: http://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html
I have solved my problem by changing the architecture of this part. I still get a lot of deadlocks, but they appear almost in the short running methods.
I have splitted my worker table to a locking and an non locking part. The actions on the locking part are now very short and no data is handling during the get_slice, finish_slice and fail_slice operations.
The transaction part with data handling are now in a non locking part and without concurrent access to table rows. The results are stored in finish_slice and fail_slice to the locking table.
Finally I have found a good description on stackoverflow too. After identifying the right search terms.
https://stackoverflow.com/a/2596101/5532934
I have the following (paraphrased) code that's subject to race conditions:
def calculate_and_cache(template, template_response):
# run a fairly slow and intensive calculation:
calculated_object = calculate_slowly(template, template_response)
cached_calculation = Calculation(calculated=calculated_object,
template=template,
template_response=template_response)
# try to save the calculation just computed:
try:
cached_calculation.save()
return cached_calculation
# if another thread beat you to saving this, catch the exception
# but return the object that was just calculated
except DatabaseError as error:
log(error)
return cached_calculation
And it's raising a DatabaseTransactionError:
TransactionManagementError: An error occurred in the current transaction.
You can't execute queries until the end of the 'atomic' block.
The docs have this to say about DTE's:
When exiting an atomic block, Django looks at whether it’s exited normally or with an exception to determine whether to commit or roll back.... If you attempt to run database queries before the rollback happens, Django will raise a TransactionManagementError.
But they also have this, much more vague thing to say about them as well:
TransactionManagementError is raised for any and all problems related to database transactions.
My questions, in order of ascending generality:
Will catching a DatabaseError actually address the race condition by letting the save() exit gracefully while still returning the object?
Where does the atomic block begin in the above code and where does it end?
What am I doing wrong and how can I fix it?
The Django docs on controlling transactions explicitly have an example of catching exceptions in atomic blocks.
In your case, you don't appear to be using the atomic decorator at all, so first you need to add the required import.
from django.db import transaction
Then you need to move the code that could raise a database error into an atomic block:
try:
with transaction.atomic():
cached_calculation.save()
return cached_calculation
# if another thread beat you to saving this, catch the exception
# but return the object that was just calculated
except DatabaseError as error:
log(error)
return cached_calculation
SQLAlchemy (0.9.8) and mysql-5.6.21-osx10.8-x86_64 and MAC OS X 10.3.3 (Yosemite)
I keep getting intermittent:
InterfaceError: (InterfaceError) 2013: Lost connection to MySQL server during query u'SELECT..... '
I have read up a few thread and most cases are resolved by adding this to my.cnf
max_allowed_packet = 1024M
which should be more than big enough for what I tried to do. After doing this, I step hit it intermittently. And putting this line in /etc/my.cnf:
log-error = "/Users/<myname>/tmp/mysql.err.log"
log-warnings = 3
I am hoping to get more details, but all I see is something like this:
[Warning] Aborted connection 444 to db: 'dbname' user: 'root' host: 'localhost' (Got an error reading communication packets)
I have reached a point where i think more detail better logging may help, or if there's something else i could try before this.
Thanks.
looks like your MySQL connection is timing out after a long period of inactivity, I bet it won't happen if you're constantly querying your DB with existing settings. There are couple of settings on both MySQL and sql sides which should resolve this issue:
check your SQLa engine's pool_recycle value, try different / smaller value, e.g. 1800 (secs). If you're reading DB settings from file, set it as
pool_recycle: 1800
otherwise specify it during engine init, e.g.
from sqlalchemy import create_engine
e = create_engine("mysql://user:pass#localhost/db", pool_recycle=1800)
check / modify your wait_timeout MySQL variable, see https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_wait_timeout which is the number of seconds the server waits for activity on a noninteractive connection before closing it. e.g.
show global variables like 'wait_timeout';
find a combination that works for your environment.
There are two params that could help, pool_recycle, pool_pre_ping.
pool_recycle decides the seconds to recycle the connection after it is inactivity. The default value of mysql is 8 hours, and the default value of sqlalchemy is -1, which means not to recycle, this is the difference, if mysql has recycled the connection and sqlalchemy did not, the Lost connection exception will be raise.
pool_pre_ping will test the connection's liveness, as my understanding, this could be used as a back-up strategy, if a connection is recycled by mysql but not recognized by sqlalchemy, sqlalchemy will do a check, and avoid to use an invalid connection.
create_engine(<mysql conn url>, pool_recycle=60 * 5, pool_pre_ping=True)
Based on suggestions from this, this and many other articles on the internet, wrapping all my functions with the following decorator helped me resolve the "Lost Connection" issue with mariadb as the backend db. Please note that db below is an instance of flask_sqlalchemy.SQLAlchemy, but the concept will remain the same for an sqlalchemy session too.
def manage_session(f):
def inner(*args, **kwargs):
# MANUAL PRE PING
try:
db.session.execute("SELECT 1;")
db.session.commit()
except:
db.session.rollback()
finally:
db.session.close()
# SESSION COMMIT, ROLLBACK, CLOSE
try:
res = f(*args, **kwargs)
db.session.commit()
return res
except Exception as e:
db.session.rollback()
raise e
# OR return traceback.format_exc()
finally:
db.session.close()
return inner
I also added pool_recycle of 50 seconds in Flask SQLAlchemy config, but that didnt visibly contribute to the solution.
EDIT1:
Below is a sample snippet of how it was used in the final code:
from flask_restful import Resource
class DataAPI(Resource):
#manage_session
def get(self):
# Get data rows from DB
None of the previous solutions worked. I managed to solve it and developed a theory. I consider myself a layman in MySQL architecture so if you understand better, please complement my suggestion.
In my case I was getting this error but the query in question was not the problem. The problem was also not the query before it. What happens is that I saved the result of some previous queries in instances and I believe that this maintained a connection to the database. After a series of processing I only performed another query minutes later.
This connection I had ended up dying without warning and when trying to perform a new query mysql threw this error. For some reason increasing the connection time did not help. I noticed that making empty commits over time fixed the problem.
db.session.commit()
I have this django views.py method that aims to insert many data into the db. It loops through arrays of models and, if an object isn't already on the db, it gets inserted.
This is what the code looks like:
def update_my_db(request):
a_models = A_Model.objects.filter(my_flag=True)
for a_model in a_models:
b_model_array = []
[...] # this is where b_model_array gets filled
for index in range(len(b_model_array)):
current_b_model = b_model_array[index]
try:
b_model = B_Model.objects.get(my_field=current_b_model.my_field)
except (KeyError, B_Model.DoesNotExist):
b_model = B_Model.objects.create(field_1=current_b_model.field_1, field_2=current_b_model.field_2)
b_model.save()
return HttpResponse(response)
I have noticed after several tests that the db is only updated by the end of the last iteration, as if django awaits to do a batch insert to mysql.
The thing is: there is a possibility of any of the iterations raising an exception, making all the data gathered so far be discarded because of the error (already tested and confirmed it). When it comes to adding 400 new lines, raising an exception at loop #399 and discarding all the previous 398 lines would be extremely undesirable for me.
I understand that batching would be the best choice concerning performance, but this is a background routine, so I'm not worried about it.
Bottomline: is there a way to actually force django to update the database on every iteration?
If you're on Django 1.6, check this out: https://docs.djangoproject.com/en/dev/topics/db/transactions/
You're interested in the context manager part of that page:
from django.db import transaction
def viewfunc(request):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()