I have a couple models that I want to update at the same time. First I get their data from the db with a simple:
s = Store.get(Store.id == store_id)
new_book = Book.get(Book.id == data[book_id'])
old_book = Book.get(Book.id == s.books.id)
The actual schema is irrelevant here. Then I do some updates to these models and at the end I save all three of them with:
s.save()
new_book.save()
old_book.save()
The function that handles these operations uses the #db.atomic() decorator so the writes are bunched into a single transaction. The problem is that what if, between the point where I get() the data from the DB and the point where I save the modified data, another process changed something with these models in the DB already. Is there a way to execute those writes (.save() operations) only if the underlying DB rows have not been changed? I could read their last_changed value but again, is there a way to do this and update at the same time? And if data has been changed, simply throw an exception?
Turns out there is a solution for this in the official docs called Optimistic Locking.
Related
The code is quite simple, as follows:
from pony.orm import Required, Set, Optional, PrimaryKey
from pony.orm import Database, db_session
import time
db = Database('mysql', host="localhost", port=3306, user="root",
passwd="123456", db="learn_pony")
class TryUpdate(db.Entity):
_table_ = "try_update_record"
t = Required(int, default=0)
db.generate_mapping(create_tables=True)
#db_session
def insert_record():
new_t = TryUpdate()
#db_session
def update():
t = TryUpdate.get(id=1)
print t.t
t.t = 0
print t.t
if __name__ == "__main__":
insert_record()
update()
pony.orm reports exception: pony.orm.core.CommitException: Object TryUpdate[1] was updated outside of current transaction. But there is no other transaction running at all
And as my experiments show, pony works OK as long as t.t is changed to a value different from the original, but it always reports exception when t.t is set to a value which equals to the original.
I'm not sure if this is a design decision. Do I have to check if my input value changes everytime before the assignment? Or is there anything I can do to avoid this annoying exception?
my pony version: 0.4.8
Thansk a lot~~~
Pony ORM author is here.
This behavior is a MySQL-specific bug which was fixed in release Pony ORM 0.4.9, so please upgrade. The rest of my answer is the explanation of what caused the bug.
The reason for this bug is the following. In order to prevent lost updates, Pony ORM uses optimistic checks. Pony tracks which attributes were read or changed during the program execution and then adds extra conditions in the WHERE section of the corresponding UPDATE query. This way Pony guarantees that no data will be lost because of the concurrent update. Lets consider the next example:
#db_session
def some_function()
obj = MyObject[123]
print obj.x
obj.x = 100
Upon exit of the some_function the #db_session decorator will commit ongoing transaction. Right before the commit, the object's data will be saved by the following UPDATE command:
UPDATE MyTable
SET x = <new_value>
WHERE id = 123 and x = <old_value>
You may wonder, why this additional condition and x = <old_value> was added? This is because Pony knows that the program saw previous value of the attribute x and may use this value in order to calculate new value of the same attribute. So Pony takes steps to guarantee that this attribute is still unchanged at the moment of the UPDATE. This approach is called "optimistic concurrency check" (see also Wikipedia article "optimistic concurrency control"). Since isolation level used by default in most databases is not SERIALIZABLE, without this additional check it is possible that some other transaction have managed to update value of the x attribute before our transaction commit, and then the value written by the concurrent transaction will be lost.
When Python database driver executes the UPDATE query, it returns the number of rows which satisfy the UPDATE criteria. This way Pony knows if the update was successful or not. If the result is 1, this means that one row was successfully found and updated, but if the result is 0, this means that the row was already modified by another transaction and now it doesn't satisfy the criteria in the WHERE section. When this happens Pony terminates the current transaction in order to prevent lost update.
The reason of the bug is that while all other database drivers return number of rows which were found by WHERE section criteria, MySQLdb driver by default returns the number of rows which were actually modified! Because of this, if the new value of the attribute turns out to be the same as the original value of the same attribute, MySQLdb reports that 0 rows were modified, and Pony (prior to the release 0.4.9) mistakenly believes that it means that the row was modified by a concurrent transaction. Started with the release 0.4.9 Pony ORM tells MySQLdb driver to behave in a standard way and return the number of rows which were found and not the number of rows which were actually updated.
Hope this helps :)
P.S. I found you question just by chance, in order to reliably get answers about Pony ORM I recommend you to send questions to our mailing list http://ponyorm-list.ponyorm.com. If you think that you found a bug you can open issue here: https://github.com/ponyorm/pony/issues.
Thank you for your question!
I got this long queryset statement on a view
contributions = user_profile.contributions_chosen.all()\
.filter(payed=False).filter(belongs_to=concert)\
.filter(contribution_def__left__gt=0)\
.filter(contribution_def__type_of='ticket')
That i use in my template
context['contributions'] = contributions
And later on that view i make changes(add or remove a record) to the table contributions_chosen and if i want my context['contributions'] updated i need to requery the database with the same lenghty query.
contributions = user_profile.contributions_chosen.all()\
.filter(payed=False).filter(belongs_to=concert)\
.filter(contribution_def__left__gt=0)\
.filter(contribution_def__type_of='ticket')
And then again update my context
context['contributions'] = contributions
So i was wondering if theres any way i can avoid repeating my self, to reevaluate the contributions so it actually reflects the real data on the database.
Ideally i would modify the queryset contributions and its values would be updated, and at the same time the database would reflect this changes, but i don't know how to do this.
UPDATE:
This is what i do between the two
context['contributions'] = contributions
I add a new contribution object to the contributions_chosen(this is a m2m relation),
contribution = Contribution.objects.create(kwarg=something,kwarg2=somethingelse)
user_profile.contributions_chosen.add(contribution)
contribution.save()
user_profile.save()
And in some cases i delete a contribution object
contribution = user_profile.contributions_chosen.get(id=1)
user_profile.contributions_chosen.get(id=request.POST['con
contribution.delete()
As you can see i'm modifying the table contributions_chosen so i have to reissue the query and update the context.
What am i doing wrong?
UPDATE
After seeing your comments about evaluating, i realize i do eval the queryset i do
len(contributions) between context['contribution'] and that seems to be problem.
I'll just move it after the database operations and thats it, thanks guy.
update
Seems you have not evaluated the queryset contributions, thus there is no need to worry about updating it because it still has not fetched data from DB.
Can you post code between two context['contributions'] = contributions lines? Normally before you evaluate the queryset contributions (for example by iterating over it or calling its __len__()), it does not contain anything reading from DB, hence you don't have to update its content.
To re-evaluate a queryset, you could
# make a clone
contribution._clone()
# or any op that makes clone, for example
contribution.filter()
# or clear its cache
contribution._result_cache = None
# you could even directly add new item to contribution._result_cache,
# but its could cause unexpected behavior w/o carefulness
I don't know how you can avoid re-evaluating the query, but one way to save some repeated statements in your code would be to create a dict with all those filters and to specify the filter args as a dict:
query_args = dict(
payed=False,
belongs_to=concert,
contribution_def__left__gt=0,
contribution_def__type_of='ticket',
)
and then
contributions = user_profile.contributions_chosen.filter(**query_args)
This just removes some repeated code, but does not solve the repeated query. If you need to change the args, just handle query_args as a normal Python dict, it is one after all :)
I'm writing a quick and dirty maintenace script to delete some rows and would like to avoid having to bring my ORM classes/mappings over from the main project. I have a query that looks similar to:
address_table = Table('address',metadata,autoload=True)
addresses = session.query(addresses_table).filter(addresses_table.c.retired == 1)
According to everything I've read, if I was using the ORM (not 'just' tables) and passed in something like:
addresses = session.query(Addresses).filter(addresses_table.c.retired == 1)
I could add a .delete() to the query, but when I try to do this using only tables I get a complaint:
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 2146, in delete
target_cls = self._mapper_zero().class_
AttributeError: 'NoneType' object has no attribute 'class_'
Which makes sense as its a table, not a class. I'm quite green when it comes to SQLAlchemy, how should I be going about this?
Looking through some code where I did something similar, I believe this will do what you want.
d = addresses_table.delete().where(addresses_table.c.retired == 1)
d.execute()
Calling delete() on a table object gives you a sql.expression (if memory serves), that you then execute. I've assumed above that the table is bound to a connection, which means you can just call execute() on it. If not, you can pass the d to execute(d) on a connection.
See docs here.
When you call delete() from a query object, SQLAlchemy performs a bulk deletion. And you need to choose a strategy for the removal of matched objects from the session. See the documentation here.
If you do not choose a strategy for the removal of matched objects from the session, then SQLAlchemy will try to evaluate the query’s criteria in Python straight on the objects in the session. If evaluation of the criteria isn’t implemented, an error is raised.
This is what is happening with your deletion.
If you only want to delete the records and do not care about the records in the session after the deletion, you can choose the strategy that ignores the session synchronization:
address_table = Table('address', metadata, autoload=True)
addresses = session.query(address_table).filter(address_table.c.retired == 1)
addresses.delete(synchronize_session=False)
I'm just using SQLAlchemy core, and cannot get the sql to allow me to add where clauses. I would like this very generic update code to work on all my tables. The intent is that this is part of a generic insert/update function that corresponds to every table. By doing it this way it allows for extremely brief test code and simple CLI utilities that can simply pass all args & options without the complexity of separate sub-commands for each table.
It'll take a few more tweaks to get it there, but should be doing the updates now just fine. However, while SQLAlchemy refers to generative queries it doesn't distinguish between selects & updates. I've reviewed SQLAlchemy documentation, Essential SQLAlchemy, stackoverflow, and several source code repositories, and have found nothing.
u = self._table.update()
non_key_kw = {}
for column in self._table.c:
if column.name in self._table.primary_key:
u.where(self._table.c[column.name] == kw[column.name])
else:
col_name = column.name
non_key_kw[column.name] = kw[column.name]
print u
result = u.execute(kw)
Which fails - it doesn't seem to recognize the where clause:
UPDATE struct SET year=?, month=?, day=?, distance=?, speed=?, slope=?, temp=?
FAIL
And I can't find any examples of building up an update in this way. Any recommendations?
the "where()" method is generative in that it returns a new Update() object. The old one is not modified:
u = u.where(...)
I have question regarding the SQLAlchemy. I have database which contains Items, every Item has assigned more Records (1:n). And the Record is partially stored in the database, but it also has an assigned file (1:1) on the filesystem.
What I want to do is to delete the assigned file when the Record is removed from the database. So I wrote the following MapperExtension:
class _StoredRecordEraser(MapperExtension):
def before_delete(self, mapper, connection, instance):
instance.erase()
The following code creates an experimental setup (full code is here: test.py):
session = Session()
i1 = Item(id='item1')
r11 = Record(id='record11', attr='1')
i1.records.append(r11)
r12 = Record(id='record12', attr='2')
i1.records.append(r12)
session.add(i1)
session.commit()
And finally, my problem... The following code works O.k. and the old.erase() method is called:
session = Session()
i1 = session.query(Item).get('item1')
old = i1.records[0]
new = Record(id='record13', attr='3')
i1.records.remove(old)
i1.records.append(new)
session.commit()
But when I change the id of a new Record to record11, which is already in the database, but it is not the same item (attr=3), the old.erase() is not called. Does anybody know why?
Thanks
A delete + insert of two records that ultimately have the same primary key within a single flush are converted into a single update right now. this is not the best behavior - it really should delete then insert, so that the various events assigned to those activities are triggered as expected (not just mapper extension methods, but database level defaults too). But the flush() process is hardwired to perform inserts/updates first, then deletes. As a workaround, you can issue a flush() after the remove/delete operation, then a second for the add/insert.
As far as flushes' current behavior, I've looked into trying to break this out but it gets very complicated - inserts which depend on deletes would have to execute after the deletes, but updates which depend on inserts would have to execute beforehand. Ultimately, the unitofwork module would be rewritten (big time) to consider all insert/update/deletes in a single stream of dependent actions that would be topologically sorted against each other. This would simplify the methods used to execute statements in the correct order, although all new systems for synchronizing data between rows based on server-level defaults would have to be devised, and its possible that complexity would be re-introduced if it turned out the "simpler" method spent too much time naively sorting insert statements that are known at the ORM level to not require any sorting against each other. The topological sort works at a more coarse grained level than that right now.