There is a student whose type attribute is 4 and the minimum value for type attribute can be 1.
In postgres
In session 1, I exclusively lock and update a row in the student table:
BEGIN;
LOCK TABLE students IN ROW EXCLUSIVE MODE;
SELECT * FROM students WHERE id = 122 FOR UPDATE;
UPDATE students SET type = 1 WHERE id = 122;
END;
In session 2 I concurrently run:
UPDATE students SET type = type - 1 WHERE id = 122;
The result I get is an exception, i.e student's type can't be lower than 1 in session 2 since I already set the same student's type to value of 1 and since session was in exclusive lock for that student, session 2 had to wait.
In Flask-SQLAlchemy
I tried to recreate the same result with user, type attribute set to 4 by default.
In session 1:
user = Student.query.with_for_update(of=Student, nowait=True).filter(Student.id == 122).first()
user.type = 1
db.session.commit()
In session 2:
user = Student.query.filter(Student.id == 122).first()
user.type -= 1
db.session.commit()
The result I get is that user's type equals 3, whereas I should get an exception.
Transaction changes in session 1 are overridden by those in session 2, even though after db.session.commit() in transaction in session 2 it waits till transaction in session 1 is over.
But in session 2 when I run this with session 1 concurrently:
user = Student.query.filter(Student.id == 122).update({"type": Student.type - 1})
db.session.commit()
I get the right output, i.e integrity error showing an attempt to set student 122s type attribute to 0 (not overriding session 1 result).
Would like to know why this happens.
The 2 sessions should look like this:
user = Student.query.with_for_update(of=Student, nowait=True).filter(Student.id == 122).first()
user.type = 1
db.session.commit()
and
user = Student.query.with_for_update(of=Student, nowait=True).filter(Student.id == 122).first()
user.type -= 1
db.session.commit()
In order for FOR UPDATE to work properly, all involved transactions which intend to update the row need to use it.
In your example, session 2 is not using with_for_update. Since you didn't tell it to use FOR UPDATE, it is free to read the old value of the row (since the new value has not yet been committed, and locks do not block pure readers), then modify it that in-memory value, then write it back.
If you don't want to use FOR UPDATE everywhere that you read row with the intention of changing it, you could instead use isolation level serializable everywhere. However if you do, things might not block, but rather will appear to succeed until the commit, then throw serialization errors that will need to be caught and dealt with.
Note: Your pre-edit example should have worked as both sessions were labelled with with_for_update.
Related
I am creating a testing web-application and each time a user submits a question correct, the incorrect or correct method will be called, giving the user passed in +1 of the totalattempted in that respective row.
def incorrect(user):
u = user
u.totalattempted += 1
session.commit()
u.score = round(u.totalcorrect * u.totalcorrect / u.totalattempted)
session.commit()
def correct(user):
u = user
u.totalattempted += 1
session.commit()
u.totalcorrect += 1
session.commit()
u.score = round(u.totalcorrect * u.totalcorrect / u.totalattempted)
session.commit()
The issue that I have is when the correct method is called several times per second, the server will correctly give the user +1 to its totalcorrect but will not give +1 to it's total attempted. I almost feel like this issue may be server-wide as when a user's account is created, there is about a 30 second delay before the server detects that user's account. Thanks!
Your code is prone to race conditions. Incrementing the value on a model updates the database with the new value. It does not perform an atomic update. To do that:
u.totalattempted = User.totalattempted + 1
The same should be done for totalcorrect.
Also, you probably shouldn't commit between updating each field. That causes SQLAlchemy to flush the session and reload the record when you update totalcorrect.
(Sorry in advance for the long question. I tried to break it up into sections to make it clearer what I'm asking. Please let me know if I should add anything else or reorganize it at all.)
Background:
I'm writing a web crawler that uses a producer/consumer model with jobs (pages to crawl or re-crawl) stored in a postgresql database table called crawler_table. I'm using SQLAlchemy to access and make changes to the database table. The exact schema is not important for this question. The important thing is that I (will) have multiple consumers, each of which repeatedly selects a record from the table, loads the page with phantomjs, and then writes information about the page back to the record.
It can happen on occasion that two consumers select the same job. This is not itself a problem; however, it is important that if they update the record with their results simultaneously, that they make consistent changes. It's good enough for me to just find out if an update would cause the record to become inconsistent. If so, I can deal with it.
Investigation:
I initially assumed that if two transactions in separate sessions read then updated the same record simultaneously, the second one to commit would fail. To test that assumption, I ran the following code (simplified slightly):
SQLAlchemySession = sessionmaker(bind=create_engine(my_postgresql_uri))
class Session (object):
# A simple wrapper for use with `with` statement
def __enter__ (self):
self.session = SQLAlchemySession()
return self.session
def __exit__ (self, exc_type, exc_val, exc_tb):
if exc_type:
self.session.rollback()
else:
self.session.commit()
self.session.close()
with Session() as session: # Create a record to play with
if session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url').count() == 0:
session.add(CrawlerPage(website='website', url='url',
first_seen=datetime.utcnow()))
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
page.failed_count = 0
# commit
# Actual experiment:
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (session)', page.failed_count
# 0 (expected)
page.failed_count += 5
with Session() as other_session:
same_page = other_session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (other_session)', same_page.failed_count
# 0 (expected)
same_page.failed_count += 10
print 'final (other_session)', same_page.failed_count
# 10 (expected)
# commit other_session, no errors (expected)
print 'final (session)', page.failed_count
# 5 (expected)
# commit session, no errors (why?)
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'final value', page.failed_count
# 5 (expected, given that there were no errors)
(Apparently Incorrect) Expectations:
I would have expected that reading a value from a record then updating that value within the same transaction would:
Be an atomic operation. That is, either succeed completely or fail completely. This much appears to be true, since the final value is 5, the value set in the last transaction to be committed.
Fail if the record being updated is updated by a concurrent session (other_session) upon attempting to commit the transaction. My rationale is that all transactions should behave as though they are performed independently in order of commit whenever possible, or should fail to commit. In these circumstances, the two transactions read then update the same value of the same record. In a version-control system, this would be the equivalent of a merge conflict. Obviously databases are not the same as version-control systems, but they have enough similarities to inform some of my assumptions about them, for better or worse.
Questions:
Why doesn't the second commit raise an exception?
Am I misunderstanding something about how SQLAlchemy handles transactions?
Am I misunderstanding something about how postgresql handles transactions? (This one seems most likely to me.)
Something else?
Is there a way to get the second commit to raise an exception?
PostgreSQL has select . . . for update, which SQLAlchemy seems to support.
My rationale is that all transactions should behave as though they are
performed independently in order of commit whenever possible, or
should fail to commit.
Well, in general there's a lot more to transactions than that. PostgreSQL's default transaction isolation level is "read committed". Loosely speaking, that means multiple transactions can simultaneously read committed values from the same rows in a table. If you want to prevent that, set transaction isolation serializable (might not work), or select...for update, or lock the table, or use a column-by-column WHERE clause, or whatever.
You can test and demonstrate transaction behavior by opening two psql connections.
begin transaction; begin transaction;
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(1 row)
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(waiting)
update test
set date = '2014-10-31'
where pid = 1
and date = '2014-10-01';
commit;
-- Locks released. SELECT for update fails.
(0 rows)
I have this running on a live website. When a user logs in I query his profile to see how many "credits" he has available. Credits are purchased via paypal. If a person buys credits and the payment comes through, the query still shows 0 credits even though if I run the same query in phpmyadmin it brings the right result. If I restart the apache webserver and reload the page the right number of credits are being shown. Here's my mapper code which shows the number of credits each user has:
mapper( User, users_table, order_by = 'user.date_added DESC, user.id DESC', properties = {
'userCreditsCount': column_property(
select(
[func.ifnull( func.sum( orders_table.c.quantity ), 0 )],
orders_table.c.user_id == users_table.c.id
).where( and_(
orders_table.c.date_added > get_order_expire_limit(), # order must not be older than a month
orders_table.c.status == STATUS_COMPLETED
) ).\
label( 'userCreditsCount' ),
deferred = True
)
# other properties....
} )
I'm using sqlalchemy with flask framework but not using their flask-sqlalchemy package (just pure sqlalchemy)
Here's how I initiate my database:
engine = create_engine( config.DATABASE_URI, pool_recycle = True )
metadata = MetaData()
db_session = scoped_session( sessionmaker( bind = engine, autoflush = True, autocommit = False ) )
I learned both python and sqlalchemy on this project so I may be missing things but this one is driving me nuts. Any ideas?
when you work with a Session, as soon as it starts working with a connection, it holds onto that connection until commit(), rollback() or close() is called. With the DBAPI, the connection to the database also remains in a transaction until the transaction is committed or rolled back.
In this case, when you've loaded data into your session, SQLAlchemy doesn't refresh the data until the transaction is ended (or if you explicitly expire some part of the data with expire()). This is the natural behavior to have, since due to transaction isolation, it's very likely that the current transaction cannot see changes that have occurred since that transaction started in any case.
So while using expire() or refresh() may or may not be part of how to get the latest data into your Session, really you need to end your transaction and start a new one to truly see what's been changed elsewhere since that transaction started. you should organize your application so that a particular Session() is ready to go when a new request comes in, but when that request completes, the Session() should be closed out, and a new one (or at least a new transaction) started up on the next request.
Please try to call refresh or expire on your object before accessing the field userCreditsCount:
user1 = session.query(User).get(1)
# ...
session.refresh(user1, ('userCreditsCount',))
This will make the query execute again (when refresh is called).
However, depending on the isolation mode your transaction uses, it might not resolve the problem, in which case you might need to commit/rollback the transaction (session) in order for the query to give you new result.
Lifespan of a Contextual Session
I'd make sure you're closing the session when you're done with it.
session = db_session()
try:
return session.query(User).get(5)
finally:
session.close()
set sessionmaker's autocommit to True and see if that helps, according to documentation sessionmaker caches
the identity map pattern, and stores objects keyed to their primary key. However, it doesn’t do any kind of query caching.
so in your code it would become:
sessionmaker(bind = engine, autoflush = True, autocommit = True)
checkSql = 'SELECT userid FROM bs_members WHERE userid = :checkUser'
doesUserExist = False
while True:
doesUserExist = False
newUser.userID = ga.getInput('Enter userID: ', "\w+$")
checkUserID = ds.execute(checkSql,checkUser=newUser.userID)
for row in ds:
if row == checkUserID:
doesUserExist = True
print 'That user name is already in use. Please enter a new username.'
break
if doesUserExist == False:
break
else:
continue
I am using the cx_Oracle module with Python 2.7. I am trying to prompt the user to enter a userID. The program will then check if the userID already exists and if it does prompt the user for a different userID. The execute method is a helper method that uses the execute method from cx_Oracle to interact with the Oracle database. The getInput method prompts the user for input that is then checked against the regular expression.
I know I have this wrong but I believe the while loop starts the first action that is taken is the user is prompted for a userID. Then the userID is checked against the database. The for loop starts and checks if the row returned by ds.execute() is the same as the userID provided by the user. If it is the user is told to use another user name and the break exits the for loop. The if statement then checks if a user exists and if it doesn't it breaks the while loop. If not then the while loop iterates so the user is prompted to enter a non-existent userID.
What happens is the user is prompted for the userID then none of the checking ever appears to happen to the user and the program moves on to the next piece of code. What am I missing here? I have included a link to the docs for execute(). The execute method in the above code is part of the following helper method:
def execute(self, statement, **parameters):
if parameters is None:
self._curs.execute(statement)
else:
self._curs.execute(statement,parameters)
If I need to provide more information let me know.
edit: I forgot the line doesUserExist = False immediately after the beginning of the while loop so I added that.
Your custom execute method doesn't return anything meaning that checkUserID in your code will be None.
Furthermore, what you're interested in is if there's at least one row returned by the query. If there are none then the userID should be available.
The docs say that calling .fetchone() returns None if no more rows are available. You can use that.
checkSql = 'SELECT userid FROM bs_members WHERE userid = :checkUser'
while True:
newUser.userID = ga.getInput('Enter userID: ', "\w+$")
ds.execute(checkSql,checkUser=newUser.userID)
if ds.fetchone() is None:
# This userID is available.
break
else:
print 'That user name is already in use. Please enter a new username.'
I'm assuming here that ds is an instance of Cursor, or subclass thereof.
At least you should have line doesUserExist = False at the beginning of the while loop. Otherwise, if user enters an existing ID once then it will keep looping forever.
So I'm creating a django app that allows a user to add a new line of text to an existing group of text lines. However I don't want multiple users adding lines to the same group of text lines concurrently. So I created a BoolField isBeingEdited that is set to True once a user decides to append a specific group. Once the Bool is True no one else can append the group until the edit is submitted, whereupon the Bool is set False again. Works alright, unless someone decides to make an edit then changes their mind or forgets about it, etc. I want isBeingEdited to flip back to False after 10 minutes or so. Is this a job for cron, or is there something easier out there? Any suggestions?
Change the boolean to a "lock time"
To lock the model, set the Lock time to the current time.
To unlock the model, set the lock time to None
Add an "is_locked" method. That method returns "not locked" if the current time is more than 10 minutes after the lock time.
This gives you your timeout without Cron and without regular hits into the DB to check flags and unset them. Instead, the time is only checked if you are interest in wieither this model is locked. A Cron would likely have to check all models.
from django.db import models
from datetime import datetime, timedelta
# Create your models here.
class yourTextLineGroup(models.Model):
# fields go here
lock_time = models.DateTimeField(null=True)
locked_by = models.ForeignKey()#Point me to your user model
def lock(self):
if self.is_locked(): #and code here to see if current user is not locked_by user
#exception / bad return value here
pass
self.lock_time = datetime.now()
def unlock(self):
self.lock_time = None
def is_locked(self):
return self.lock_time and datetime.now() - self.lock_time < timedelta(minutes=10)
Code above assumes that the caller will call the save method after calling lock or unlock.