I have a big problem with a deadlock in an InnoDB table used with sqlalchemy.
sqlalchemy.exc.InternalError: (mysql.connector.errors.InternalError) 1213 (40001): Deadlock found when trying to get lock; try restarting transaction.
I have already serialized the access, but still get a deadlock error.
This code is executed on the first call in every function. Every thread and process should wait here, till it gets the lock. It's simplified, as selectors are removed.
# The work with the index -1 always exists.
f = s.query(WorkerInProgress).with_for_update().filter(
WorkerInProgress.offset == -1).first()
I have reduced my code to a minimal state. I am currently running only concurrent calls on the method next_slice. Session handling, rollback and deadloc handling are handled outside.
I get deadlocks even all access is serialized. I did tried to increment a retry counter in the offset == -1 entity as well.
def next_slice(self, s, processgroup_id, itemcount):
f = s.query(WorkerInProgress).with_for_update().filter(
WorkerInProgress.offset == -1).first()
#Take first matching object if available / Maybe some workers failed
item = s.query(WorkerInProgress).with_for_update().filter(
WorkerInProgress.processgroup_id != processgroup_id,
WorkerInProgress.processgroup_id != 'finished',
WorkerInProgress.processgroup_id != 'finished!locked',
WorkerInProgress.offset != -1
).order_by(WorkerInProgress.offset.asc()).limit(1).first()
# *****
# Some code is missing here. as it's not executed in my testcase
# Fetch the latest item and add a new one
item = s.query(WorkerInProgress).with_for_update().order_by(
WorkerInProgress.offset.desc()).limit(1).first()
new = WorkerInProgress()
new.offset = item.offset + item.count
new.count = itemcount
new.maxtries = 3
new.processgroup_id = processgroup_id
s.add(new)
s.commit()
return new.offset, new.count
I don't understand why the deadlocks are occurring.
I have reduced deadlock by fetching all items in one query, but still get deadlocks. Perhaps someone can help me.
Finally I solved my problem. It's all in the documentation, but I have to understand it first.
Always be prepared to re-issue a transaction if it fails due to
deadlock. Deadlocks are not dangerous. Just try again.
Source: http://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html
I have solved my problem by changing the architecture of this part. I still get a lot of deadlocks, but they appear almost in the short running methods.
I have splitted my worker table to a locking and an non locking part. The actions on the locking part are now very short and no data is handling during the get_slice, finish_slice and fail_slice operations.
The transaction part with data handling are now in a non locking part and without concurrent access to table rows. The results are stored in finish_slice and fail_slice to the locking table.
Finally I have found a good description on stackoverflow too. After identifying the right search terms.
https://stackoverflow.com/a/2596101/5532934
Related
#receiver(post_save, sender=MyRequestLog)
def steptwo_launcher(sender, instance, **kwargs):
GeneralLogging(calledBy='MyRequestLog', logmsg='enter steptwo_launcher').save() # remember to remove this line
if instance.stepCode == 100:
GeneralLogging(calledBy='MyRequestLog', logmsg='step code 100 found. launch next step').save()
nextStep.delay(instance.requestId,False)
I think I just witness my code losing a race condition. The backend of my application updates the status of task one, and it writes a stepCode of 100 to the log when the next task should be started. The front end of the application polls to report the current step back to the end user.
It appears that after the backend created an entry with stepCode 100, the front request came in so soon after, that the if instance.stepCode == 100: was never found to be True. The GeneralLogging only reports one entry at the time of the suspected collision and does not launch the nextstep.
My question is to 1) Confirm this is possible, which I already suspect. and 2) A way to fix this so nextStep is not skipped due to the race condition.
This question lacks a bunch of potentially useful information (e.g. missing code, missing output), but any code of the form
if state == x:
change_state
has a potential race condition when multiple control paths hit this code.
Two of the most common ways to handle this problem are (1) locks:
with some_lock:
if state:
change_state
i.e. stop everyone else from hitting this code until we're done, and (2) queues:
queue.push(item_to_be_processed)
# somewhere else
item_to_be_processed = queue.pop()
A queue/lock implementation in a db could use select_for_update and use an extra processed field, i.e. let the "writer" process save the model with processed = False and have the "reader" process do:
from django.db import transaction
...
with transaction.atomic():
items = MyRequestLog.objects.select_for_update(skip_locked=True).filter(
stepCode=100,
processed=False
)
for item in items:
do_something_useful(item) # you might want to pull this outside of the atomic block if your application allows (so you don't keep rows locked for an extended period)
item.processed = True
item.save()
ps: check your database for support (https://docs.djangoproject.com/en/2.0/ref/models/querysets/#select-for-update)
Using Django on a MySQL database I get the following error:
OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
The fault rises in the following code:
start_time = 1422086855
end_time = 1422088657
self.model.objects.filter(
user=self.user,
timestamp__gte=start_time,
timestamp__lte=end_time).delete()
for sample in samples:
o = self.model(user=self.user)
o.timestamp = sample.timestamp
...
o.save()
I have several parallell processes working on the same database and sometimes they might have the same job or an overlap in sample data. That's why I need to clear the database and then store the new samples since I don't want any duplicates.
I'm running the whole thing in a transaction block with transaction.commit_on_success() and am getting the OperationalError exception quite often. What I'd prefer is that the transaction doesn't end up in a deadlock, but instead just locks and waits for the other process to be finished with its work.
From what I've read I should order the locks correctly, but I'm not sure how to do this in Django.
What is the easiest way to ensure that I'm not getting this error while still making sure that I don't lose any data?
Use select_for_update() method:
samples = self.model.objects.select_for_update().filter(
user=self.user,
timestamp__gte=start_time,
timestamp__lte=end_time)
for sample in samples:
# do something with a sample
sample.save()
Note that you shouldn't delete selected samples and create new ones. Just update the filtered records. Lock for these records will be released then your transaction will be committed.
BTW instead of __gte/__lte lookups you can use __range:
samples = self.model.objects.select_for_update().filter(
user=self.user,
timestamp__range=(start_time, end_time))
To avoid deadlocks, what I did was implement a way of retrying a query in case a deadlock happens.
In order to do this, what I did was I monkey patched the method "execute" of django's CursorWrapper class. This method is called whenever a query is made, so it will work across the entire ORM and you won't have to worry about deadlocks across your project:
import django.db.backends.utils
from django.db import OperationalError
import time
original = django.db.backends.utils.CursorWrapper.execute
def execute_wrapper(*args, **kwargs):
attempts = 0
while attempts < 3:
try:
return original(*args, **kwargs)
except OperationalError as e:
code = e.args[0]
if attempts == 2 or code != 1213:
raise e
attempts += 1
time.sleep(0.2)
django.db.backends.utils.CursorWrapper.execute = execute_wrapper
What the code above does is: it will try running the query and if an OperationalError is thrown with the error code 1213 (a deadlock), it will wait for 200 ms and try again. It will do this 3 times and if after 3 times the problem was not solved, the original exception is raised.
This code should be executed when the django project is being loaded into memory and so a good place to put it is in the __ini__.py file of any of your apps (I placed in the __ini__.py file of my project's main directory - the one that has the same name as your django project).
Hope this helps anyone in the future.
(Sorry in advance for the long question. I tried to break it up into sections to make it clearer what I'm asking. Please let me know if I should add anything else or reorganize it at all.)
Background:
I'm writing a web crawler that uses a producer/consumer model with jobs (pages to crawl or re-crawl) stored in a postgresql database table called crawler_table. I'm using SQLAlchemy to access and make changes to the database table. The exact schema is not important for this question. The important thing is that I (will) have multiple consumers, each of which repeatedly selects a record from the table, loads the page with phantomjs, and then writes information about the page back to the record.
It can happen on occasion that two consumers select the same job. This is not itself a problem; however, it is important that if they update the record with their results simultaneously, that they make consistent changes. It's good enough for me to just find out if an update would cause the record to become inconsistent. If so, I can deal with it.
Investigation:
I initially assumed that if two transactions in separate sessions read then updated the same record simultaneously, the second one to commit would fail. To test that assumption, I ran the following code (simplified slightly):
SQLAlchemySession = sessionmaker(bind=create_engine(my_postgresql_uri))
class Session (object):
# A simple wrapper for use with `with` statement
def __enter__ (self):
self.session = SQLAlchemySession()
return self.session
def __exit__ (self, exc_type, exc_val, exc_tb):
if exc_type:
self.session.rollback()
else:
self.session.commit()
self.session.close()
with Session() as session: # Create a record to play with
if session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url').count() == 0:
session.add(CrawlerPage(website='website', url='url',
first_seen=datetime.utcnow()))
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
page.failed_count = 0
# commit
# Actual experiment:
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (session)', page.failed_count
# 0 (expected)
page.failed_count += 5
with Session() as other_session:
same_page = other_session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (other_session)', same_page.failed_count
# 0 (expected)
same_page.failed_count += 10
print 'final (other_session)', same_page.failed_count
# 10 (expected)
# commit other_session, no errors (expected)
print 'final (session)', page.failed_count
# 5 (expected)
# commit session, no errors (why?)
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'final value', page.failed_count
# 5 (expected, given that there were no errors)
(Apparently Incorrect) Expectations:
I would have expected that reading a value from a record then updating that value within the same transaction would:
Be an atomic operation. That is, either succeed completely or fail completely. This much appears to be true, since the final value is 5, the value set in the last transaction to be committed.
Fail if the record being updated is updated by a concurrent session (other_session) upon attempting to commit the transaction. My rationale is that all transactions should behave as though they are performed independently in order of commit whenever possible, or should fail to commit. In these circumstances, the two transactions read then update the same value of the same record. In a version-control system, this would be the equivalent of a merge conflict. Obviously databases are not the same as version-control systems, but they have enough similarities to inform some of my assumptions about them, for better or worse.
Questions:
Why doesn't the second commit raise an exception?
Am I misunderstanding something about how SQLAlchemy handles transactions?
Am I misunderstanding something about how postgresql handles transactions? (This one seems most likely to me.)
Something else?
Is there a way to get the second commit to raise an exception?
PostgreSQL has select . . . for update, which SQLAlchemy seems to support.
My rationale is that all transactions should behave as though they are
performed independently in order of commit whenever possible, or
should fail to commit.
Well, in general there's a lot more to transactions than that. PostgreSQL's default transaction isolation level is "read committed". Loosely speaking, that means multiple transactions can simultaneously read committed values from the same rows in a table. If you want to prevent that, set transaction isolation serializable (might not work), or select...for update, or lock the table, or use a column-by-column WHERE clause, or whatever.
You can test and demonstrate transaction behavior by opening two psql connections.
begin transaction; begin transaction;
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(1 row)
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(waiting)
update test
set date = '2014-10-31'
where pid = 1
and date = '2014-10-01';
commit;
-- Locks released. SELECT for update fails.
(0 rows)
I have this django views.py method that aims to insert many data into the db. It loops through arrays of models and, if an object isn't already on the db, it gets inserted.
This is what the code looks like:
def update_my_db(request):
a_models = A_Model.objects.filter(my_flag=True)
for a_model in a_models:
b_model_array = []
[...] # this is where b_model_array gets filled
for index in range(len(b_model_array)):
current_b_model = b_model_array[index]
try:
b_model = B_Model.objects.get(my_field=current_b_model.my_field)
except (KeyError, B_Model.DoesNotExist):
b_model = B_Model.objects.create(field_1=current_b_model.field_1, field_2=current_b_model.field_2)
b_model.save()
return HttpResponse(response)
I have noticed after several tests that the db is only updated by the end of the last iteration, as if django awaits to do a batch insert to mysql.
The thing is: there is a possibility of any of the iterations raising an exception, making all the data gathered so far be discarded because of the error (already tested and confirmed it). When it comes to adding 400 new lines, raising an exception at loop #399 and discarding all the previous 398 lines would be extremely undesirable for me.
I understand that batching would be the best choice concerning performance, but this is a background routine, so I'm not worried about it.
Bottomline: is there a way to actually force django to update the database on every iteration?
If you're on Django 1.6, check this out: https://docs.djangoproject.com/en/dev/topics/db/transactions/
You're interested in the context manager part of that page:
from django.db import transaction
def viewfunc(request):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()
I have a SQLAlchemy model that represents a file and thus contains the path to an actual file. Since deletion of the database row and file should go along (so no orphaned files are left and no rows point to deleted files) I added a delete() method to my model class:
def delete(self):
if os.path.exists(self.path):
os.remove(self.path)
db.session.delete(self)
This works fine but has one huge disadvantage: The file is deleted immediately before the transaction containing the database deletion is committed.
One option would be committing in the delete() method - but I don't want to do this since I might not be finished with the current transaction. So I'm looking for a way to delay the deletion of the physical file until the transaction deleting the row is actually committed.
SQLAlchemy has an after_delete event but according to the docs this is triggered when the SQL is emitted (i.e. on flush) which is too early. It also has an after_commit event but at this point everything deleted in the transaction has probably been deleted from SA.
When using SQLAlchemy in a Flask app with Flask-SQLAlchemy it provides a models_committed signal which receives a list of (model, operation) tuples. Using this signal doing what I'm looking for is extremely easy:
#models_committed.connect_via(app)
def on_models_committed(sender, changes):
for obj, change in changes:
if change == 'delete' and hasattr(obj, '__commit_delete__'):
obj.__commit_delete__()
With this generic function every model that needs on-delete-commit code now simply needs to have a method __commit_delete__(self) and do whatever it needs to do in that method.
It can also be done without Flask-SQLAlchemy, however, in this case it needs some more code:
A deletion needs to be recorded when it's performed. This is be done using the after_delete event.
Any recorded deletions need to be handled when a COMMIT is successful. This is done using the after_commit event.
In case the transaction fails or is manually rolled back the recorded changes also need to be cleared. This is done using the after_rollback() event.
This follows along with the other event-based answers, but I thought I'd post this code, since I wrote it to solve pretty much your exact problem:
The code (below) registers a SessionExtension class that accumulates all new, changed, and deleted objects as flushes occur, then clears or evaluates the queue when the session is actually committed or rolled back. For the classes which have an external file attached, I then implemented obj.after_db_new(session), obj.after_db_update(session), and/or obj.after_db_delete(session) methods which the SessionExtension invokes as appropriate; you can then populate those methods to take care of creating / saving / deleting the external files.
Note: I'm almost positive this could be rewritten in a cleaner manner using SqlAlchemy's new event system, and it has a few other flaws, but it's in production and working, so I haven't updated it :)
import logging; log = logging.getLogger(__name__)
from sqlalchemy.orm.session import SessionExtension
class TrackerExtension(SessionExtension):
def __init__(self):
self.new = set()
self.deleted = set()
self.dirty = set()
def after_flush(self, session, flush_context):
# NOTE: requires >= SA 0.5
self.new.update(obj for obj in session.new
if hasattr(obj, "after_db_new"))
self.deleted.update(obj for obj in session.deleted
if hasattr(obj, "after_db_delete"))
self.dirty.update(obj for obj in session.dirty
if hasattr(obj, "after_db_update"))
def after_commit(self, session):
# NOTE: this is rather hackneyed, in that it hides errors until
# the end, just so it can commit as many objects as possible.
# FIXME: could integrate this w/ twophase to make everything safer in case the methods fail.
log.debug("after commit: new=%r deleted=%r dirty=%r",
self.new, self.deleted, self.dirty)
ecount = 0
if self.new:
for obj in self.new:
try:
obj.after_db_new(session)
except:
ecount += 1
log.critical("error occurred in after_db_new: obj=%r",
obj, exc_info=True)
self.new.clear()
if self.deleted:
for obj in self.deleted:
try:
obj.after_db_delete(session)
except:
ecount += 1
log.critical("error occurred in after_db_delete: obj=%r",
obj, exc_info=True)
self.deleted.clear()
if self.dirty:
for obj in self.dirty:
try:
obj.after_db_update(session)
except:
ecount += 1
log.critical("error occurred in after_db_update: obj=%r",
obj, exc_info=True)
self.dirty.clear()
if ecount:
raise RuntimeError("%r object error during after_commit() ... "
"see traceback for more" % ecount)
def after_rollback(self, session):
self.new.clear()
self.deleted.clear()
self.dirty.clear()
# then add "extension=TrackerExtension()" to the Session constructor
this seems to be a bit challenging, Im curious if a sql trigger AFTER DELETE might be the best route for this, granted it won't be dry and Im not sure the sql database you are using supports it, still AFAIK sqlalchemy pushes transactions to the db but it really doesn't know when they have being committed, if Im interpreting this comment correctly:
its the database server itself that maintains all "pending" data in an ongoing transaction. The changes aren't persisted permanently to disk, and revealed publically to other transactions, until the database receives a COMMIT command which is what Session.commit() sends.
taken from SQLAlchemy: What's the difference between flush() and commit()? by the creator of sqlalchemy ...
If your SQLAlchemy backend supports it, enable two-phase commit. You will need to use (or write) a transaction model for the filesystem that:
checks permissions, etc. to ensure that the file exists and can be deleted during the first commit phase
actually deletes the file during the second commit phase.
That's probably as good as it's going to get. Unix filesystems, as far as I know, do not natively support XA or other two-phase transactional systems, so you will have to live with the small exposure from having a second-phase filesystem delete fail unexpectedly.